1
|
Cunha I, Latron E, Bauer S, Sage D, Griffié J. Machine learning in microscopy - insights, opportunities and challenges. J Cell Sci 2024; 137:jcs262095. [PMID: 39465533 DOI: 10.1242/jcs.262095] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/29/2024] Open
Abstract
Machine learning (ML) is transforming the field of image processing and analysis, from automation of laborious tasks to open-ended exploration of visual patterns. This has striking implications for image-driven life science research, particularly microscopy. In this Review, we focus on the opportunities and challenges associated with applying ML-based pipelines for microscopy datasets from a user point of view. We investigate the significance of different data characteristics - quantity, transferability and content - and how this determines which ML model(s) to use, as well as their output(s). Within the context of cell biological questions and applications, we further discuss ML utility range, namely data curation, exploration, prediction and explanation, and what they entail and translate to in the context of microscopy. Finally, we explore the challenges, common artefacts and risks associated with ML in microscopy. Building on insights from other fields, we propose how these pitfalls might be mitigated for in microscopy.
Collapse
Affiliation(s)
- Inês Cunha
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Tomtebodavägen 23, 171 65 Solna, Sweden
| | - Emma Latron
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Tomtebodavägen 23, 171 65 Solna, Sweden
| | - Sebastian Bauer
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Tomtebodavägen 23, 171 65 Solna, Sweden
| | - Daniel Sage
- Biomedical Imaging Group and EPFL Center for Imaging, École Polytechnique, Rte Cantonale, 1015 Lausanne, Switzerland
| | - Juliette Griffié
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Tomtebodavägen 23, 171 65 Solna, Sweden
| |
Collapse
|
2
|
Carreras-Puigvert J, Spjuth O. Artificial intelligence for high content imaging in drug discovery. Curr Opin Struct Biol 2024; 87:102842. [PMID: 38797109 DOI: 10.1016/j.sbi.2024.102842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 04/28/2024] [Accepted: 04/29/2024] [Indexed: 05/29/2024]
Abstract
Artificial intelligence (AI) and high-content imaging (HCI) are contributing to advancements in drug discovery, propelled by the recent progress in deep neural networks. This review highlights AI's role in analysis of HCI data from fixed and live-cell imaging, enabling novel label-free and multi-channel fluorescent screening methods, and improving compound profiling. HCI experiments are rapid and cost-effective, facilitating large data set accumulation for AI model training. However, the success of AI in drug discovery also depends on high-quality data, reproducible experiments, and robust validation to ensure model performance. Despite challenges like the need for annotated compounds and managing vast image data, AI's potential in phenotypic screening and drug profiling is significant. Future improvements in AI, including increased interpretability and integration of multiple modalities, are expected to solidify AI and HCI's role in drug discovery.
Collapse
Affiliation(s)
- Jordi Carreras-Puigvert
- Department of Pharmaceutical Biosciences and Science for Life Laboratories, Uppsala University, Sweden.
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences and Science for Life Laboratories, Uppsala University, Sweden.
| |
Collapse
|
3
|
Ertürk A. Deep 3D histology powered by tissue clearing, omics and AI. Nat Methods 2024; 21:1153-1165. [PMID: 38997593 DOI: 10.1038/s41592-024-02327-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Accepted: 05/28/2024] [Indexed: 07/14/2024]
Abstract
To comprehensively understand tissue and organism physiology and pathophysiology, it is essential to create complete three-dimensional (3D) cellular maps. These maps require structural data, such as the 3D configuration and positioning of tissues and cells, and molecular data on the constitution of each cell, spanning from the DNA sequence to protein expression. While single-cell transcriptomics is illuminating the cellular and molecular diversity across species and tissues, the 3D spatial context of these molecular data is often overlooked. Here, I discuss emerging 3D tissue histology techniques that add the missing third spatial dimension to biomedical research. Through innovations in tissue-clearing chemistry, labeling and volumetric imaging that enhance 3D reconstructions and their synergy with molecular techniques, these technologies will provide detailed blueprints of entire organs or organisms at the cellular level. Machine learning, especially deep learning, will be essential for extracting meaningful insights from the vast data. Further development of integrated structural, molecular and computational methods will unlock the full potential of next-generation 3D histology.
Collapse
Affiliation(s)
- Ali Ertürk
- Institute for Tissue Engineering and Regenerative Medicine, Helmholtz Zentrum München, Neuherberg, Germany.
- Institute for Stroke and Dementia Research, Klinikum der Universität München, Ludwig-Maximilians University, Munich, Germany.
- School of Medicine, Koç University, İstanbul, Turkey.
- Deep Piction GmbH, Munich, Germany.
| |
Collapse
|
4
|
Spicher N, Wesemeyer T, Deserno TM. Crowdsourcing image segmentation for deep learning: integrated platform for citizen science, paid microtask, and gamification. BIOMED ENG-BIOMED TE 2024; 69:293-305. [PMID: 38143326 DOI: 10.1515/bmt-2023-0148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 11/30/2023] [Indexed: 12/26/2023]
Abstract
OBJECTIVES Segmentation is crucial in medical imaging. Deep learning based on convolutional neural networks showed promising results. However, the absence of large-scale datasets and a high degree of inter- and intra-observer variations pose a bottleneck. Crowdsourcing might be an alternative, as many non-experts provide references. We aim to compare different types of crowdsourcing for medical image segmentation. METHODS We develop a crowdsourcing platform that integrates citizen science (incentive: participating in the research), paid microtask (incentive: financial reward), and gamification (incentive: entertainment). For evaluation, we choose the use case of sclera segmentation in fundus images as a proof-of-concept and analyze the accuracy of crowdsourced masks and the generalization of learning models trained with crowdsourced masks. RESULTS The developed platform is suited for the different types of crowdsourcing and offers an easy and intuitive way to implement crowdsourcing studies. Regarding the proof-of-concept study, citizen science, paid microtask, and gamification yield a median F-score of 82.2, 69.4, and 69.3 % compared to expert-labeled ground truth, respectively. Generating consensus masks improves the gamification masks (78.3 %). Despite the small training data (50 images), deep learning reaches median F-scores of 80.0, 73.5, and 76.5 % for citizen science, paid microtask, and gamification, respectively, indicating sufficient generalizability. CONCLUSIONS As the platform has proven useful, we aim to make it available as open-source software for other researchers.
Collapse
Affiliation(s)
- Nicolai Spicher
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Braunschweig, Lower Saxony, Germany
| | - Tim Wesemeyer
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Braunschweig, Lower Saxony, Germany
| | - Thomas M Deserno
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Braunschweig, Lower Saxony, Germany
| |
Collapse
|
5
|
Xiong J, Kaur H, Heiser CN, McKinley ET, Roland JT, Coffey RJ, Shrubsole MJ, Wrobel J, Ma S, Lau KS, Vandekar S. GammaGateR: semi-automated marker gating for single-cell multiplexed imaging. Bioinformatics 2024; 40:btae356. [PMID: 38833684 PMCID: PMC11193056 DOI: 10.1093/bioinformatics/btae356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 04/20/2024] [Accepted: 06/03/2024] [Indexed: 06/06/2024] Open
Abstract
MOTIVATION Multiplexed immunofluorescence (mIF) is an emerging assay for multichannel protein imaging that can decipher cell-level spatial features in tissues. However, existing automated cell phenotyping methods, such as clustering, face challenges in achieving consistency across experiments and often require subjective evaluation. As a result, mIF analyses often revert to marker gating based on manual thresholding of raw imaging data. RESULTS To address the need for an evaluable semi-automated algorithm, we developed GammaGateR, an R package for interactive marker gating designed specifically for segmented cell-level data from mIF images. Based on a novel closed-form gamma mixture model, GammaGateR provides estimates of marker-positive cell proportions and soft clustering of marker-positive cells. The model incorporates user-specified constraints that provide a consistent but slide-specific model fit. We compared GammaGateR against the newest unsupervised approach for annotating mIF data, employing two colon datasets and one ovarian cancer dataset for the evaluation. We showed that GammaGateR produces highly similar results to a silver standard established through manual annotation. Furthermore, we demonstrated its effectiveness in identifying biological signals, achieved by mapping known spatial interactions between CD68 and MUC5AC cells in the colon and by accurately predicting survival in ovarian cancer patients using the phenotype probabilities as input for machine learning methods. GammaGateR is a highly efficient tool that can improve the replicability of marker gating results, while reducing the time of manual segmentation. AVAILABILITY AND IMPLEMENTATION The R package is available at https://github.com/JiangmeiRubyXiong/GammaGateR.
Collapse
Affiliation(s)
- Jiangmei Xiong
- Department of Biostatistics, Vanderbilt University, 2525 West End Avenue, Suite 1100, Nashville, TN 37203-1741, United States
| | - Harsimran Kaur
- Program of Chemical and Physical Biology, Vanderbilt University School of Medicine, 340 Light Hall, 2215 Garland Ave, Nashville, TN 37232, United States
- Epithelial Biology Center, Vanderbilt University Medical Center, MRBIV 10415-E, 2213 Garland Avenue, Nashville, TN 37232, United States
| | - Cody N Heiser
- Program of Chemical and Physical Biology, Vanderbilt University School of Medicine, 340 Light Hall, 2215 Garland Ave, Nashville, TN 37232, United States
- Epithelial Biology Center, Vanderbilt University Medical Center, MRBIV 10415-E, 2213 Garland Avenue, Nashville, TN 37232, United States
- Regeneron Pharmaceuticals, 777 Old Saw Mill River Road, Tarrytown, NY 10591, United States
| | - Eliot T McKinley
- Epithelial Biology Center, Vanderbilt University Medical Center, MRBIV 10415-E, 2213 Garland Avenue, Nashville, TN 37232, United States
- GlaxoSmithKline, 410 Blackwell St, Durham, NC 27701, United States
| | - Joseph T Roland
- Epithelial Biology Center, Vanderbilt University Medical Center, MRBIV 10415-E, 2213 Garland Avenue, Nashville, TN 37232, United States
- Department of Surgery, Vanderbilt University Medical Center, 2215 Garland Ave Medical Research Building IV, Nashville, TN 37232, United States
| | - Robert J Coffey
- Epithelial Biology Center, Vanderbilt University Medical Center, MRBIV 10415-E, 2213 Garland Avenue, Nashville, TN 37232, United States
- Department of Medicine, Vanderbilt University Medical Center, 1161 21st Ave S, Nashville, TN 37232, United States
| | - Martha J Shrubsole
- Department of Medicine, Vanderbilt University Medical Center, 1161 21st Ave S, Nashville, TN 37232, United States
| | - Julia Wrobel
- Department of Biostatistics and Bioinformatics, Emory University, 1518 Clifton Rd, Atlanta, GA 30322, United States
| | - Siyuan Ma
- Department of Biostatistics, Vanderbilt University, 2525 West End Avenue, Suite 1100, Nashville, TN 37203-1741, United States
| | - Ken S Lau
- Program of Chemical and Physical Biology, Vanderbilt University School of Medicine, 340 Light Hall, 2215 Garland Ave, Nashville, TN 37232, United States
- Epithelial Biology Center, Vanderbilt University Medical Center, MRBIV 10415-E, 2213 Garland Avenue, Nashville, TN 37232, United States
- Regeneron Pharmaceuticals, 777 Old Saw Mill River Road, Tarrytown, NY 10591, United States
- Department of Cell and Developmental Biology, Vanderbilt University School of Medicine, 10475 Medical Research Building IV, 2215 Garland Avenue, Nashville, TN 37232, United States
| | - Simon Vandekar
- Department of Biostatistics, Vanderbilt University, 2525 West End Avenue, Suite 1100, Nashville, TN 37203-1741, United States
| |
Collapse
|
6
|
Razdaibiedina A, Brechalov A, Friesen H, Mattiazzi Usaj M, Masinas MPD, Garadi Suresh H, Wang K, Boone C, Ba J, Andrews B. PIFiA: self-supervised approach for protein functional annotation from single-cell imaging data. Mol Syst Biol 2024; 20:521-548. [PMID: 38472305 PMCID: PMC11066028 DOI: 10.1038/s44320-024-00029-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 02/27/2024] [Accepted: 02/28/2024] [Indexed: 03/14/2024] Open
Abstract
Fluorescence microscopy data describe protein localization patterns at single-cell resolution and have the potential to reveal whole-proteome functional information with remarkable precision. Yet, extracting biologically meaningful representations from cell micrographs remains a major challenge. Existing approaches often fail to learn robust and noise-invariant features or rely on supervised labels for accurate annotations. We developed PIFiA (Protein Image-based Functional Annotation), a self-supervised approach for protein functional annotation from single-cell imaging data. We imaged the global yeast ORF-GFP collection and applied PIFiA to generate protein feature profiles from single-cell images of fluorescently tagged proteins. We show that PIFiA outperforms existing approaches for molecular representation learning and describe a range of downstream analysis tasks to explore the information content of the feature profiles. Specifically, we cluster extracted features into a hierarchy of functional organization, study cell population heterogeneity, and develop techniques to distinguish multi-localizing proteins and identify functional modules. Finally, we confirm new PIFiA predictions using a colocalization assay, suggesting previously unappreciated biological roles for several proteins. Paired with a fully interactive website ( https://thecellvision.org/pifia/ ), PIFiA is a resource for the quantitative analysis of protein organization within the cell.
Collapse
Affiliation(s)
- Anastasia Razdaibiedina
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
| | - Alexander Brechalov
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
| | - Helena Friesen
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
| | - Mojca Mattiazzi Usaj
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
- Department of Chemistry and Biology, Toronto Metropolitan University, Toronto, ON, Canada
| | | | | | - Kyle Wang
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada
| | - Charles Boone
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada.
- RIKEN Center for Sustainable Resource Science, 2-1 Hirosawa, Wako, Saitama, Japan.
| | - Jimmy Ba
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada.
- Department of Computer Science, University of Toronto, Toronto, ON, Canada.
| | - Brenda Andrews
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada.
- The Donnelly Centre, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
7
|
Sarrazin-Gendron R, Ghasemloo Gheidari P, Butyaev A, Keding T, Cai E, Zheng J, Mutalova R, Mounthanyvong J, Zhu Y, Nazarova E, Drogaris C, Erhart K, Brouillette A, Richard G, Pitchford R, Caisse S, Blanchette M, McDonald D, Knight R, Szantner A, Waldispühl J. Improving microbial phylogeny with citizen science within a mass-market video game. Nat Biotechnol 2024:10.1038/s41587-024-02175-6. [PMID: 38622344 DOI: 10.1038/s41587-024-02175-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 02/05/2024] [Indexed: 04/17/2024]
Abstract
Citizen science video games are designed primarily for users already inclined to contribute to science, which severely limits their accessibility for an estimated community of 3 billion gamers worldwide. We created Borderlands Science (BLS), a citizen science activity that is seamlessly integrated within a popular commercial video game played by tens of millions of gamers. This integration is facilitated by a novel game-first design of citizen science games, in which the game design aspect has the highest priority, and a suitable task is then mapped to the game design. BLS crowdsources a multiple alignment task of 1 million 16S ribosomal RNA sequences obtained from human microbiome studies. Since its initial release on 7 April 2020, over 4 million players have solved more than 135 million science puzzles, a task unsolvable by a single individual. Leveraging these results, we show that our multiple sequence alignment simultaneously improves microbial phylogeny estimations and UniFrac effect sizes compared to state-of-the-art computational methods. This achievement demonstrates that hyper-gamified scientific tasks attract massive crowds of contributors and offers invaluable resources to the scientific community.
Collapse
Affiliation(s)
| | | | | | - Timothy Keding
- School of Computer Science, McGill University, Montréal, QC, Canada
| | - Eddie Cai
- School of Computer Science, McGill University, Montréal, QC, Canada
| | - Jiayue Zheng
- School of Computer Science, McGill University, Montréal, QC, Canada
| | - Renata Mutalova
- School of Computer Science, McGill University, Montréal, QC, Canada
| | | | - Yuxue Zhu
- School of Computer Science, McGill University, Montréal, QC, Canada
| | - Elena Nazarova
- School of Computer Science, McGill University, Montréal, QC, Canada
| | | | - Kornél Erhart
- Massively Multiplayer Online Science, Gryon, Switzerland
| | | | | | | | | | | | - Daniel McDonald
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Rob Knight
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
- Department of Computer Science, University of California, San Diego, La Jolla, CA, USA
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
- Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, USA
| | - Attila Szantner
- School of Computer Science, McGill University, Montréal, QC, Canada
- Massively Multiplayer Online Science, Gryon, Switzerland
| | - Jérôme Waldispühl
- School of Computer Science, McGill University, Montréal, QC, Canada.
| |
Collapse
|
8
|
Wilkinson JL, Thornhill I, Oldenkamp R, Gachanja A, Busquets R. Pharmaceuticals and Personal Care Products in the Aquatic Environment: How Can Regions at Risk be Identified in the Future? ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY 2024; 43:575-588. [PMID: 37818878 DOI: 10.1002/etc.5763] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 07/11/2023] [Accepted: 10/09/2023] [Indexed: 10/13/2023]
Abstract
Pharmaceuticals and personal care products (PPCPs) are an indispensable component of a healthy society. However, they are well-established environmental contaminants, and many can elicit biological disruption in exposed organisms. It is now a decade since the landmark review covering the top 20 questions on PPCPs in the environment (Boxall et al., 2012). In the present study we discuss key research priorities for the next 10 years with a focus on how regions where PPCPs pose the greatest risk to environmental and human health, either now or in the future, can be identified. Specifically, we discuss why this problem is of importance and review our current understanding of PPCPs in the aquatic environment. Foci include PPCP occurrence and what drives their environmental emission as well as our ability to both quantify and model their distribution. We highlight critical areas for future research including the involvement of citizen science for environmental monitoring and using modeling techniques to bridge the gap between research capacity and needs. Because prioritization of regions in need of environmental monitoring is needed to assess future/current risks, we also propose four criteria with which this may be achieved. By applying these criteria to available monitoring data, we narrow the focus on where monitoring efforts for PPCPs are most urgent. Specifically, we highlight 19 cities across Africa, Central America, the Caribbean, and Asia as priorities for future environmental monitoring and risk characterization and define four priority research questions for the next 10 years. Environ Toxicol Chem 2024;43:575-588. © 2023 The Authors. Environmental Toxicology and Chemistry published by Wiley Periodicals LLC on behalf of SETAC.
Collapse
Affiliation(s)
- John L Wilkinson
- Environment and Geography Department, University of York, York, UK
| | - Ian Thornhill
- School of Environment, Education and Development, The University of Manchester, Manchester, UK
| | - Rik Oldenkamp
- Amsterdam Institute for Life and Environment, Faculty of Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Amsterdam Institute for Global Health and Development, University of Amsterdam, Amsterdam, The Netherlands
| | - Anthony Gachanja
- Department of Food Science and Post-Harvest Technology, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
| | - Rosa Busquets
- Department of Chemical and Pharmaceutical Sciences, Kingston University London, Kingston-upon-Thames, UK
| |
Collapse
|
9
|
Jan M, Spangaro A, Lenartowicz M, Mattiazzi Usaj M. From pixels to insights: Machine learning and deep learning for bioimage analysis. Bioessays 2024; 46:e2300114. [PMID: 38058114 DOI: 10.1002/bies.202300114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Revised: 10/25/2023] [Accepted: 11/13/2023] [Indexed: 12/08/2023]
Abstract
Bioimage analysis plays a critical role in extracting information from biological images, enabling deeper insights into cellular structures and processes. The integration of machine learning and deep learning techniques has revolutionized the field, enabling the automated, reproducible, and accurate analysis of biological images. Here, we provide an overview of the history and principles of machine learning and deep learning in the context of bioimage analysis. We discuss the essential steps of the bioimage analysis workflow, emphasizing how machine learning and deep learning have improved preprocessing, segmentation, feature extraction, object tracking, and classification. We provide examples that showcase the application of machine learning and deep learning in bioimage analysis. We examine user-friendly software and tools that enable biologists to leverage these techniques without extensive computational expertise. This review is a resource for researchers seeking to incorporate machine learning and deep learning in their bioimage analysis workflows and enhance their research in this rapidly evolving field.
Collapse
Affiliation(s)
- Mahta Jan
- Department of Chemistry and Biology, Toronto Metropolitan University, Toronto, Canada
| | - Allie Spangaro
- Department of Chemistry and Biology, Toronto Metropolitan University, Toronto, Canada
| | - Michelle Lenartowicz
- Department of Chemistry and Biology, Toronto Metropolitan University, Toronto, Canada
| | - Mojca Mattiazzi Usaj
- Department of Chemistry and Biology, Toronto Metropolitan University, Toronto, Canada
| |
Collapse
|
10
|
Chai B, Efstathiou C, Yue H, Draviam VM. Opportunities and challenges for deep learning in cell dynamics research. Trends Cell Biol 2023:S0962-8924(23)00228-3. [PMID: 38030542 DOI: 10.1016/j.tcb.2023.10.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 09/30/2023] [Accepted: 10/13/2023] [Indexed: 12/01/2023]
Abstract
The growth of artificial intelligence (AI) has led to an increase in the adoption of computer vision and deep learning (DL) techniques for the evaluation of microscopy images and movies. This adoption has not only addressed hurdles in quantitative analysis of dynamic cell biological processes but has also started to support advances in drug development, precision medicine, and genome-phenome mapping. We survey existing AI-based techniques and tools, as well as open-source datasets, with a specific focus on the computational tasks of segmentation, classification, and tracking of cellular and subcellular structures and dynamics. We summarise long-standing challenges in microscopy video analysis from a computational perspective and review emerging research frontiers and innovative applications for DL-guided automation in cell dynamics research.
Collapse
Affiliation(s)
- Binghao Chai
- School of Biological and Behavioural Sciences, Queen Mary University of London (QMUL), London E1 4NS, UK
| | - Christoforos Efstathiou
- School of Biological and Behavioural Sciences, Queen Mary University of London (QMUL), London E1 4NS, UK
| | - Haoran Yue
- School of Biological and Behavioural Sciences, Queen Mary University of London (QMUL), London E1 4NS, UK
| | - Viji M Draviam
- School of Biological and Behavioural Sciences, Queen Mary University of London (QMUL), London E1 4NS, UK; The Alan Turing Institute, London NW1 2DB, UK.
| |
Collapse
|
11
|
Xiong J, Kaur H, Heiser CN, McKinley ET, Roland JT, Coffey RJ, Shrubsole MJ, Wrobel J, Ma S, Lau KS, Vandekar S. GammaGateR: semi-automated marker gating for single-cell multiplexed imaging. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.20.558645. [PMID: 37781604 PMCID: PMC10541135 DOI: 10.1101/2023.09.20.558645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/03/2023]
Abstract
Motivation Multiplexed immunofluorescence (mIF) is an emerging assay for multichannel protein imaging that can decipher cell-level spatial features in tissues. However, existing automated cell phenotyping methods, such as clustering, face challenges in achieving consistency across experiments and often require subjective evaluation. As a result, mIF analyses often revert to marker gating based on manual thresholding of raw imaging data. Results To address the need for an evaluable semi-automated algorithm, we developed GammaGateR, an R package for interactive marker gating designed specifically for segmented cell-level data from mIF images. Based on a novel closed-form gamma mixture model, GammaGateR provides estimates of marker-positive cell proportions and soft clustering of marker-positive cells. The model incorporates user-specified constraints that provide a consistent but slide-specific model fit. We compared GammaGateR against the newest unsupervised approach for annotating mIF data, employing two colon datasets and one ovarian cancer dataset for the evaluation. We showed that GammaGateR produces highly similar results to a silver standard established through manual annotation. Furthermore, we demonstrated its effectiveness in identifying biological signals, achieved by mapping known spatial interactions between CD68 and MUC5AC cells in the colon and by accurately predicting survival in ovarian cancer patients using the phenotype probabilities as input for machine learning methods. GammaGateR is a highly efficient tool that can improve the replicability of marker gating results, while reducing the time of manual segmentation. Availability and Implementation The R package is available at https://github.com/JiangmeiRubyXiong/GammaGateR.
Collapse
Affiliation(s)
| | - Harsimran Kaur
- Program of Chemical and Physical Biology, Vanderbilt University School of Medicine, USA
- Epithelial Biology Center, Vanderbilt University Medical Center, USA
| | - Cody N Heiser
- Program of Chemical and Physical Biology, Vanderbilt University School of Medicine, USA
- Epithelial Biology Center, Vanderbilt University Medical Center, USA
- Regeneron Pharmaceuticals, USA
| | - Eliot T McKinley
- Epithelial Biology Center, Vanderbilt University Medical Center, USA
- GlaxoSmithKline, USA
| | - Joseph T Roland
- Epithelial Biology Center, Vanderbilt University Medical Center, USA
- Department of Surgery, Vanderbilt University Medical Center, USA
| | - Robert J Coffey
- Epithelial Biology Center, Vanderbilt University Medical Center, USA
- Department of Medicine, Vanderbilt University Medical Center, USA
| | | | - Julia Wrobel
- Department of Biostatistics and Bioinformatics, Emory University, USA
| | - Siyuan Ma
- Department of Biostatistics, Vanderbilt University, USA
| | - Ken S Lau
- Program of Chemical and Physical Biology, Vanderbilt University School of Medicine, USA
- Epithelial Biology Center, Vanderbilt University Medical Center, USA
- Department of Surgery, Vanderbilt University Medical Center, USA
- Department of Cell and Developmental Biology, Vanderbilt University School of Medicine, USA
| | | |
Collapse
|
12
|
Smith P, King ONF, Pennington A, Tun W, Basham M, Jones ML, Collinson LM, Darrow MC, Spiers H. Online citizen science with the Zooniverse for analysis of biological volumetric data. Histochem Cell Biol 2023; 160:253-276. [PMID: 37284846 PMCID: PMC10245346 DOI: 10.1007/s00418-023-02204-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/28/2023] [Indexed: 06/08/2023]
Abstract
Public participation in research, also known as citizen science, is being increasingly adopted for the analysis of biological volumetric data. Researchers working in this domain are applying online citizen science as a scalable distributed data analysis approach, with recent research demonstrating that non-experts can productively contribute to tasks such as the segmentation of organelles in volume electron microscopy data. This, alongside the growing challenge to rapidly process the large amounts of biological volumetric data now routinely produced, means there is increasing interest within the research community to apply online citizen science for the analysis of data in this context. Here, we synthesise core methodological principles and practices for applying citizen science for analysis of biological volumetric data. We collate and share the knowledge and experience of multiple research teams who have applied online citizen science for the analysis of volumetric biological data using the Zooniverse platform ( www.zooniverse.org ). We hope this provides inspiration and practical guidance regarding how contributor effort via online citizen science may be usefully applied in this domain.
Collapse
Affiliation(s)
- Patricia Smith
- The Rosalind Franklin Institute, Harwell Campus, Fermi Avenue, Didcot, OX11 0FA, UK
| | - Oliver N F King
- Diamond Light Source, Harwell Campus, Fermi Avenue, Didcot, OX11 0DE, UK
| | - Avery Pennington
- The Rosalind Franklin Institute, Harwell Campus, Fermi Avenue, Didcot, OX11 0FA, UK
- Diamond Light Source, Harwell Campus, Fermi Avenue, Didcot, OX11 0DE, UK
| | - Win Tun
- Diamond Light Source, Harwell Campus, Fermi Avenue, Didcot, OX11 0DE, UK
| | - Mark Basham
- The Rosalind Franklin Institute, Harwell Campus, Fermi Avenue, Didcot, OX11 0FA, UK
- Diamond Light Source, Harwell Campus, Fermi Avenue, Didcot, OX11 0DE, UK
| | | | | | - Michele C Darrow
- The Rosalind Franklin Institute, Harwell Campus, Fermi Avenue, Didcot, OX11 0FA, UK.
| | - Helen Spiers
- The Francis Crick Institute, London, NW1 1AT, UK.
| |
Collapse
|
13
|
Soelistyo CJ, Ulicna K, Lowe AR. Machine learning enhanced cell tracking. FRONTIERS IN BIOINFORMATICS 2023; 3:1228989. [PMID: 37521315 PMCID: PMC10380934 DOI: 10.3389/fbinf.2023.1228989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 07/03/2023] [Indexed: 08/01/2023] Open
Abstract
Quantifying cell biology in space and time requires computational methods to detect cells, measure their properties, and assemble these into meaningful trajectories. In this aspect, machine learning (ML) is having a transformational effect on bioimage analysis, now enabling robust cell detection in multidimensional image data. However, the task of cell tracking, or constructing accurate multi-generational lineages from imaging data, remains an open challenge. Most cell tracking algorithms are largely based on our prior knowledge of cell behaviors, and as such, are difficult to generalize to new and unseen cell types or datasets. Here, we propose that ML provides the framework to learn aspects of cell behavior using cell tracking as the task to be learned. We suggest that advances in representation learning, cell tracking datasets, metrics, and methods for constructing and evaluating tracking solutions can all form part of an end-to-end ML-enhanced pipeline. These developments will lead the way to new computational methods that can be used to understand complex, time-evolving biological systems.
Collapse
Affiliation(s)
- Christopher J. Soelistyo
- Department of Structural and Molecular Biology, University College London, London, United Kingdom
- Institute for the Physics of Living Systems, London, United Kingdom
| | - Kristina Ulicna
- Department of Structural and Molecular Biology, University College London, London, United Kingdom
- Institute for the Physics of Living Systems, London, United Kingdom
| | - Alan R. Lowe
- Department of Structural and Molecular Biology, University College London, London, United Kingdom
- Institute for the Physics of Living Systems, London, United Kingdom
- Alan Turing Institute, London, United Kingdom
| |
Collapse
|
14
|
Doron M, Moutakanni T, Chen ZS, Moshkov N, Caron M, Touvron H, Bojanowski P, Pernice WM, Caicedo JC. Unbiased single-cell morphology with self-supervised vision transformers. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.16.545359. [PMID: 37398158 PMCID: PMC10312751 DOI: 10.1101/2023.06.16.545359] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Accurately quantifying cellular morphology at scale could substantially empower existing single-cell approaches. However, measuring cell morphology remains an active field of research, which has inspired multiple computer vision algorithms over the years. Here, we show that DINO, a vision-transformer based, self-supervised algorithm, has a remarkable ability for learning rich representations of cellular morphology without manual annotations or any other type of supervision. We evaluate DINO on a wide variety of tasks across three publicly available imaging datasets of diverse specifications and biological focus. We find that DINO encodes meaningful features of cellular morphology at multiple scales, from subcellular and single-cell resolution, to multi-cellular and aggregated experimental groups. Importantly, DINO successfully uncovers a hierarchy of biological and technical factors of variation in imaging datasets. The results show that DINO can support the study of unknown biological variation, including single-cell heterogeneity and relationships between samples, making it an excellent tool for image-based biological discovery.
Collapse
Affiliation(s)
- Michael Doron
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | | | - Nikita Moshkov
- Synthetic and Systems Biology Unit, Biological Research Centre (BRC), Szeged, Hungary
| | | | | | | | - Wolfgang M. Pernice
- Department of Neurology, Columbia University Medical Center, New York, NY, USA
| | | |
Collapse
|
15
|
Husain SS, Ong EJ, Minskiy D, Bober-Irizar M, Irizar A, Bober M. Single-cell subcellular protein localisation using novel ensembles of diverse deep architectures. Commun Biol 2023; 6:489. [PMID: 37147530 PMCID: PMC10163260 DOI: 10.1038/s42003-023-04840-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Accepted: 04/12/2023] [Indexed: 05/07/2023] Open
Abstract
Unravelling protein distributions within individual cells is vital to understanding their function and state and indispensable to developing new treatments. Here we present the Hybrid subCellular Protein Localiser (HCPL), which learns from weakly labelled data to robustly localise single-cell subcellular protein patterns. It comprises innovative DNN architectures exploiting wavelet filters and learnt parametric activations that successfully tackle drastic cell variability. HCPL features correlation-based ensembling of novel architectures that boosts performance and aids generalisation. Large-scale data annotation is made feasible by our AI-trains-AI approach, which determines the visual integrity of cells and emphasises reliable labels for efficient training. In the Human Protein Atlas context, we demonstrate that HCPL is best performing in the single-cell classification of protein localisation patterns. To better understand the inner workings of HCPL and assess its biological relevance, we analyse the contributions of each system component and dissect the emergent features from which the localisation predictions are derived.
Collapse
Affiliation(s)
| | - Eng-Jon Ong
- CVSSP, University of Surrey, Guildford, GU27XH, Surrey, UK
| | - Dmitry Minskiy
- CVSSP, University of Surrey, Guildford, GU27XH, Surrey, UK
| | - Mikel Bober-Irizar
- CVSSP, University of Surrey, Guildford, GU27XH, Surrey, UK
- ForecomAI, London, W1W 5PF, UK
| | | | - Miroslaw Bober
- CVSSP, University of Surrey, Guildford, GU27XH, Surrey, UK
- ForecomAI, London, W1W 5PF, UK
| |
Collapse
|
16
|
Shin W, Im J, Koo RH, Kim J, Kwon KR, Kwon D, Kim JJ, Lee JH, Kwon D. Self-Curable Synaptic Ferroelectric FET Arrays for Neuromorphic Convolutional Neural Network. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023; 10:e2207661. [PMID: 36973600 DOI: 10.1002/advs.202207661] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Revised: 02/20/2023] [Indexed: 05/27/2023]
Abstract
With the recently increasing prevalence of deep learning, both academia and industry exhibit substantial interest in neuromorphic computing, which mimics the functional and structural features of the human brain. To realize neuromorphic computing, an energy-efficient and reliable artificial synapse must be developed. In this study, the synaptic ferroelectric field-effect-transistor (FeFET) array is fabricated as a component of a neuromorphic convolutional neural network. Beyond the single transistor level, the long-term potentiation and depression of synaptic weights are achieved at the array level, and a successful program-inhibiting operation is demonstrated in the synaptic array, achieving a learning accuracy of 79.84% on the Canadian Institute for Advanced Research (CIFAR)-10 dataset. Furthermore, an efficient self-curing method is proposed to improve the endurance of the FeFET array by tenfold, utilizing the punch-through current inherent to the device. Low-frequency noise spectroscopy is employed to quantitatively evaluate the curing efficiency of the proposed self-curing method. The results of this study provide a method to fabricate and operate reliable synaptic FeFET arrays, thereby paving the way for further development of ferroelectric-based neuromorphic computing.
Collapse
Affiliation(s)
- Wonjun Shin
- Department of Electrical and Computer Engineering, Inter-University Semiconductor Research Center, Seoul National University, Seoul, 08826, Republic of Korea
| | - Jiyong Im
- Department of Electronic Engineering, Hanyang University, Seoul, 04763, South Korea
| | - Ryun-Han Koo
- Department of Electrical and Computer Engineering, Inter-University Semiconductor Research Center, Seoul National University, Seoul, 08826, Republic of Korea
| | - Jaehyeon Kim
- Department of Electrical and Computer Engineering, Inter-University Semiconductor Research Center, Seoul National University, Seoul, 08826, Republic of Korea
| | - Ki-Ryun Kwon
- Department of Electronic Engineering, Hanyang University, Seoul, 04763, South Korea
| | - Dongseok Kwon
- Department of Electrical and Computer Engineering, Inter-University Semiconductor Research Center, Seoul National University, Seoul, 08826, Republic of Korea
| | - Jae-Joon Kim
- Department of Electrical and Computer Engineering, Inter-University Semiconductor Research Center, Seoul National University, Seoul, 08826, Republic of Korea
| | - Jong-Ho Lee
- Department of Electrical and Computer Engineering, Inter-University Semiconductor Research Center, Seoul National University, Seoul, 08826, Republic of Korea
- Present address: Ministry of Science and ICT, Sejong, 30121, Republic of Korea
| | - Daewoong Kwon
- Department of Electronic Engineering, Hanyang University, Seoul, 04763, South Korea
| |
Collapse
|
17
|
Robillard AJ, Trizna MG, Ruiz‐Tafur M, Dávila Panduro EL, de Santana CD, White AE, Dikow RB, Deichmann JL. Application of a deep learning image classifier for identification of Amazonian fishes. Ecol Evol 2023; 13:e9987. [PMID: 37143991 PMCID: PMC10151603 DOI: 10.1002/ece3.9987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 03/10/2023] [Accepted: 03/24/2023] [Indexed: 05/06/2023] Open
Abstract
Given the sharp increase in agricultural and infrastructure development and the paucity of widespread data available to support conservation management decisions, a more rapid and accurate tool for identifying fish fauna in the world's largest freshwater ecosystem, the Amazon, is needed. Current strategies for identification of freshwater fishes require high levels of training and taxonomic expertise for morphological identification or genetic testing for species recognition at a molecular level. To overcome these challenges, we built an image masking model (U-Net) and a convolutional neural net (CNN) to classify Amazonian fish in photographs. Fish used to generate training data were collected and photographed in tributaries in seasonally flooded forests of the upper Morona River valley in Loreto, Peru in 2018 and 2019. Species identifications in the training images (n = 3068) were verified by expert ichthyologists. These images were supplemented with photographs taken of additional Amazonian fish specimens housed in the ichthyological collection of the Smithsonian's National Museum of Natural History. We generated a CNN model that identified 33 genera of fishes with a mean accuracy of 97.9%. Wider availability of accurate freshwater fish image recognition tools, such as the one described here, will enable fishermen, local communities, and citizen scientists to more effectively participate in collecting and sharing data from their territories to inform policy and management decisions that impact them directly.
Collapse
Affiliation(s)
- Alexander J. Robillard
- Data Science LabOffice of the Chief Information Officer, Smithsonian InstitutionWashingtonDistrict of ColumbiaUSA
- Center for Conservation and SustainabilitySmithsonian National Zoo and Conservation Biology InstituteWashingtonDistrict of ColumbiaUSA
- Chesapeake Biological LaboratoryUniversity of Maryland Center for Environmental ScienceSolomonsMarylandUSA
| | - Michael G. Trizna
- Data Science LabOffice of the Chief Information Officer, Smithsonian InstitutionWashingtonDistrict of ColumbiaUSA
| | - Morgan Ruiz‐Tafur
- Center for Conservation and SustainabilitySmithsonian National Zoo and Conservation Biology InstituteWashingtonDistrict of ColumbiaUSA
- Laboratorio de Taxonomía de PecesInstituto de Investigaciones de la Amazonía Peruana (IIAP)San Juan BautistaPeru
| | - Edgard Leonardo Dávila Panduro
- Center for Conservation and SustainabilitySmithsonian National Zoo and Conservation Biology InstituteWashingtonDistrict of ColumbiaUSA
| | - C. David de Santana
- Division of Fishes, Department of Vertebrate Zoology, MRC 159, National Museum of Natural HistorySmithsonian InstitutionWashingtonDistrict of ColumbiaUSA
| | - Alexander E. White
- Data Science LabOffice of the Chief Information Officer, Smithsonian InstitutionWashingtonDistrict of ColumbiaUSA
| | - Rebecca B. Dikow
- Data Science LabOffice of the Chief Information Officer, Smithsonian InstitutionWashingtonDistrict of ColumbiaUSA
| | - Jessica L. Deichmann
- Center for Conservation and SustainabilitySmithsonian National Zoo and Conservation Biology InstituteWashingtonDistrict of ColumbiaUSA
- Working Land and Seascapes, Conservation CommonsSmithsonian InstitutionWashingtonDistrict of ColumbiaUSA
| |
Collapse
|
18
|
Rädsch T, Reinke A, Weru V, Tizabi MD, Schreck N, Kavur AE, Pekdemir B, Roß T, Kopp-Schneider A, Maier-Hein L. Labelling instructions matter in biomedical image analysis. NAT MACH INTELL 2023. [DOI: 10.1038/s42256-023-00625-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Abstract
AbstractBiomedical image analysis algorithm validation depends on high-quality annotation of reference datasets, for which labelling instructions are key. Despite their importance, their optimization remains largely unexplored. Here we present a systematic study of labelling instructions and their impact on annotation quality in the field. Through comprehensive examination of professional practice and international competitions registered at the Medical Image Computing and Computer Assisted Intervention Society, the largest international society in the biomedical imaging field, we uncovered a discrepancy between annotators’ needs for labelling instructions and their current quality and availability. On the basis of an analysis of 14,040 images annotated by 156 annotators from four professional annotation companies and 708 Amazon Mechanical Turk crowdworkers using instructions with different information density levels, we further found that including exemplary images substantially boosts annotation performance compared with text-only descriptions, while solely extending text descriptions does not. Finally, professional annotators constantly outperform Amazon Mechanical Turk crowdworkers. Our study raises awareness for the need of quality standards in biomedical image analysis labelling instructions.
Collapse
|
19
|
Razdaibiedina A, Brechalov A, Friesen H, Usaj MM, Masinas MPD, Suresh HG, Wang K, Boone C, Ba J, Andrews B. PIFiA: Self-supervised Approach for Protein Functional Annotation from Single-Cell Imaging Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.24.529975. [PMID: 36909656 PMCID: PMC10002629 DOI: 10.1101/2023.02.24.529975] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/03/2023]
Abstract
Fluorescence microscopy data describe protein localization patterns at single-cell resolution and have the potential to reveal whole-proteome functional information with remarkable precision. Yet, extracting biologically meaningful representations from cell micrographs remains a major challenge. Existing approaches often fail to learn robust and noise-invariant features or rely on supervised labels for accurate annotations. We developed PIFiA, (Protein Image-based Functional Annotation), a self-supervised approach for protein functional annotation from single-cell imaging data. We imaged the global yeast ORF-GFP collection and applied PIFiA to generate protein feature profiles from single-cell images of fluorescently tagged proteins. We show that PIFiA outperforms existing approaches for molecular representation learning and describe a range of downstream analysis tasks to explore the information content of the feature profiles. Specifically, we cluster extracted features into a hierarchy of functional organization, study cell population heterogeneity, and develop techniques to distinguish multi-localizing proteins and identify functional modules. Finally, we confirm new PIFiA predictions using a colocalization assay, suggesting previously unappreciated biological roles for several proteins. Paired with a fully interactive website (https://thecellvision.org/pifia/), PIFiA is a resource for the quantitative analysis of protein organization within the cell.
Collapse
Affiliation(s)
- Anastasia Razdaibiedina
- Department of Molecular Genetics, University of Toronto, Toronto ON, Canada
- The Donnelly Centre, University of Toronto, Toronto ON, Canada
- Vector Institute for Artificial Intelligence, Toronto ON, Canada
| | - Alexander Brechalov
- Department of Molecular Genetics, University of Toronto, Toronto ON, Canada
- The Donnelly Centre, University of Toronto, Toronto ON, Canada
| | - Helena Friesen
- The Donnelly Centre, University of Toronto, Toronto ON, Canada
| | | | | | | | - Kyle Wang
- Department of Molecular Genetics, University of Toronto, Toronto ON, Canada
- The Donnelly Centre, University of Toronto, Toronto ON, Canada
| | - Charles Boone
- Department of Molecular Genetics, University of Toronto, Toronto ON, Canada
- The Donnelly Centre, University of Toronto, Toronto ON, Canada
- RIKEN Center for Sustainable Resource Science, 2-1 Hirosawa, Wako, Saitama, Japan
| | - Jimmy Ba
- Department of Computer Science, University of Toronto, Toronto ON, Canada
- Vector Institute for Artificial Intelligence, Toronto ON, Canada
| | - Brenda Andrews
- Department of Molecular Genetics, University of Toronto, Toronto ON, Canada
- The Donnelly Centre, University of Toronto, Toronto ON, Canada
| |
Collapse
|
20
|
Gebodh N, Miskovic V, Laszlo S, Datta A, Bikson M. A Scalable Framework for Closed-Loop Neuromodulation with Deep Learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.18.524615. [PMID: 36712027 PMCID: PMC9882307 DOI: 10.1101/2023.01.18.524615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Closed-loop neuromodulation measures dynamic neural or physiological activity to optimize interventions for clinical and nonclinical behavioral, cognitive, wellness, attentional, or general task performance enhancement. Conventional closed-loop stimulation approaches can contain biased biomarker detection (decoders and error-based triggering) and stimulation-type application. We present and verify a novel deep learning framework for designing and deploying flexible, data-driven, automated closed-loop neuromodulation that is scalable using diverse datasets, agnostic to stimulation technology (supporting multi-modal stimulation: tACS, tDCS, tFUS, TMS), and without the need for personalized ground-truth performance data. Our approach is based on identified periods of responsiveness - detected states that result in a change in performance when stimulation is applied compared to no stimulation. To demonstrate our framework, we acquire, analyze, and apply a data-driven approach to our open sourced GX dataset, which includes concurrent physiological (ECG, EOG) and neuronal (EEG) measures, paired with continuous vigilance/attention-fatigue tracking, and High-Definition transcranial electrical stimulation (HD-tES). Our framework's decision process for intervention application identified 88.26% of trials as correct applications, showed potential improvement with varying stimulation types, or missed opportunities to stimulate, whereas 11.25% of trials were predicted to stimulate at inopportune times. With emerging datasets and stimulation technologies, our unifying and integrative framework; leveraging deep learning (Convolutional Neural Networks - CNNs); demonstrates the adaptability and feasibility of automated multimodal neuromodulation for both clinical and nonclinical applications.
Collapse
Affiliation(s)
- Nigel Gebodh
- The Department of Biomedical Engineering, The City College of New York, The City University of New York, New York USA
| | | | | | | | - Marom Bikson
- The Department of Biomedical Engineering, The City College of New York, The City University of New York, New York USA
| |
Collapse
|
21
|
Zhu XL, Bao LX, Xue MQ, Xu YY. Automatic recognition of protein subcellular location patterns in single cells from immunofluorescence images based on deep learning. Brief Bioinform 2023; 24:6964519. [PMID: 36577448 DOI: 10.1093/bib/bbac609] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Revised: 11/16/2022] [Accepted: 12/11/2022] [Indexed: 12/30/2022] Open
Abstract
With the improvement of single-cell measurement techniques, there is a growing awareness that individual differences exist among cells, and protein expression distribution can vary across cells in the same tissue or cell line. Pinpointing the protein subcellular locations in single cells is crucial for mapping functional specificity of proteins and studying related diseases. Currently, research about single-cell protein location is still in its infancy, and most studies and databases do not annotate proteins at the cell level. For example, in the human protein atlas database, an immunofluorescence image stained for a particular protein shows multiple cells, but the subcellular location annotation is for the whole image, ignoring intercellular difference. In this study, we used large-scale immunofluorescence images and image-level subcellular locations to develop a deep-learning-based pipeline that could accurately recognize protein localizations in single cells. The pipeline consisted of two deep learning models, i.e. an image-based model and a cell-based model. The former used a multi-instance learning framework to comprehensively model protein distribution in multiple cells in each image, and could give both image-level and cell-level predictions. The latter firstly used clustering and heuristics algorithms to assign pseudo-labels of subcellular locations to the segmented cell images, and then used the pseudo-labels to train a classification model. Finally, the image-based model was fused with the cell-based model at the decision level to obtain the final ensemble model for single-cell prediction. Our experimental results showed that the ensemble model could achieve higher accuracy and robustness on independent test sets than state-of-the-art methods.
Collapse
Affiliation(s)
- Xi-Liang Zhu
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou 510515, China.,Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China
| | - Lin-Xia Bao
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou 510515, China.,Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China
| | - Min-Qi Xue
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou 510515, China.,Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China
| | - Ying-Ying Xu
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou 510515, China.,Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China
| |
Collapse
|
22
|
Huang Y, Chang H, Chen X, Meng J, Han M, Huang T, Yuan L, Zhang G. A cell marker-based clustering strategy (cmCluster) for precise cell type identification of scRNA-seq data. QUANTITATIVE BIOLOGY 2023. [DOI: 10.15302/j-qb-022-0311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/11/2023]
|
23
|
Miller JA, Vepřek LH, Deterding S, Cooper S. Practical recommendations from a multi-perspective needs and challenges assessment of citizen science games. PLoS One 2023; 18:e0285367. [PMID: 37146022 PMCID: PMC10162532 DOI: 10.1371/journal.pone.0285367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 04/21/2023] [Indexed: 05/07/2023] Open
Abstract
Citizen science games are an increasingly popular form of citizen science, in which volunteer participants engage in scientific research while playing a game. Their success depends on a diverse set of stakeholders working together-scientists, volunteers, and game developers. Yet the potential needs of these stakeholder groups and their possible tensions are poorly understood. To identify these needs and possible tensions, we conducted a qualitative data analysis of two years of ethnographic research and 57 interviews with stakeholders from 10 citizen science games, following a combination of grounded theory and reflexive thematic analysis. We identify individual stakeholder needs as well as important barriers to citizen science game success. These include the ambiguous allocation of developer roles, limited resources and funding dependencies, the need for a citizen science game community, and science-game tensions. We derive recommendations for addressing these barriers.
Collapse
Affiliation(s)
| | | | | | - Seth Cooper
- Northeastern University, Boston, Massachusetts, United States of America
| |
Collapse
|
24
|
Bhatia HS, Brunner AD, Öztürk F, Kapoor S, Rong Z, Mai H, Thielert M, Ali M, Al-Maskari R, Paetzold JC, Kofler F, Todorov MI, Molbay M, Kolabas ZI, Negwer M, Hoeher L, Steinke H, Dima A, Gupta B, Kaltenecker D, Caliskan ÖS, Brandt D, Krahmer N, Müller S, Lichtenthaler SF, Hellal F, Bechmann I, Menze B, Theis F, Mann M, Ertürk A. Spatial proteomics in three-dimensional intact specimens. Cell 2022; 185:5040-5058.e19. [PMID: 36563667 DOI: 10.1016/j.cell.2022.11.021] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 06/13/2022] [Accepted: 11/18/2022] [Indexed: 12/24/2022]
Abstract
Spatial molecular profiling of complex tissues is essential to investigate cellular function in physiological and pathological states. However, methods for molecular analysis of large biological specimens imaged in 3D are lacking. Here, we present DISCO-MS, a technology that combines whole-organ/whole-organism clearing and imaging, deep-learning-based image analysis, robotic tissue extraction, and ultra-high-sensitivity mass spectrometry. DISCO-MS yielded proteome data indistinguishable from uncleared samples in both rodent and human tissues. We used DISCO-MS to investigate microglia activation along axonal tracts after brain injury and characterized early- and late-stage individual amyloid-beta plaques in a mouse model of Alzheimer's disease. DISCO-bot robotic sample extraction enabled us to study the regional heterogeneity of immune cells in intact mouse bodies and aortic plaques in a complete human heart. DISCO-MS enables unbiased proteome analysis of preclinical and clinical tissues after unbiased imaging of entire specimens in 3D, identifying diagnostic and therapeutic opportunities for complex diseases. VIDEO ABSTRACT.
Collapse
Affiliation(s)
- Harsharan Singh Bhatia
- Insititute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Zentrum München, 85764 Neuherberg, Germany; Institute for Stroke and Dementia Research, Klinikum der Universität München, Ludwig-Maximilians University Munich, 81377 Munich, Germany
| | - Andreas-David Brunner
- Department for Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, 82152 Martinsried, Germany; Boehringer Ingelheim Pharma GmbH & Co. KG, Drug Discovery Sciences, Birkendorfer Str. 65, D-88400 Biberach Riss, Germany
| | - Furkan Öztürk
- Insititute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | - Saketh Kapoor
- Insititute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | - Zhouyi Rong
- Insititute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Zentrum München, 85764 Neuherberg, Germany; Institute for Stroke and Dementia Research, Klinikum der Universität München, Ludwig-Maximilians University Munich, 81377 Munich, Germany; Munich Medical Research School (MMRS), 80336 Munich, Germany
| | - Hongcheng Mai
- Insititute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Zentrum München, 85764 Neuherberg, Germany; Institute for Stroke and Dementia Research, Klinikum der Universität München, Ludwig-Maximilians University Munich, 81377 Munich, Germany; Munich Medical Research School (MMRS), 80336 Munich, Germany
| | - Marvin Thielert
- Department for Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, 82152 Martinsried, Germany
| | - Mayar Ali
- Insititute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Zentrum München, 85764 Neuherberg, Germany; Graduate School of Neuroscience (GSN), 82152 Munich, Germany
| | - Rami Al-Maskari
- Institute for Stroke and Dementia Research, Klinikum der Universität München, Ludwig-Maximilians University Munich, 81377 Munich, Germany; Center for Translational Cancer Research (TranslaTUM) of the TUM, 81675 Munich, Germany; Image-Based Biomedical Modeling, Department of Informatics, Technical University of Munich, 85748 Garching, Germany
| | - Johannes Christian Paetzold
- Insititute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Zentrum München, 85764 Neuherberg, Germany; Center for Translational Cancer Research (TranslaTUM) of the TUM, 81675 Munich, Germany; Image-Based Biomedical Modeling, Department of Informatics, Technical University of Munich, 85748 Garching, Germany; Biomedical Image Analysis Group, Department of Computing, Imperial College London, London SW7 2AZ, UK
| | - Florian Kofler
- Center for Translational Cancer Research (TranslaTUM) of the TUM, 81675 Munich, Germany; Image-Based Biomedical Modeling, Department of Informatics, Technical University of Munich, 85748 Garching, Germany; Helmholtz AI, Helmholtz Zentrum München, 85764 Neuherberg, Germany; Department of Neuroradiology, Klinikum rechts der Isar, 81675 Munich, Germany
| | - Mihail Ivilinov Todorov
- Insititute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Zentrum München, 85764 Neuherberg, Germany; Institute for Stroke and Dementia Research, Klinikum der Universität München, Ludwig-Maximilians University Munich, 81377 Munich, Germany
| | - Muge Molbay
- Insititute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Zentrum München, 85764 Neuherberg, Germany; Institute for Stroke and Dementia Research, Klinikum der Universität München, Ludwig-Maximilians University Munich, 81377 Munich, Germany; Munich Medical Research School (MMRS), 80336 Munich, Germany
| | - Zeynep Ilgin Kolabas
- Insititute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Zentrum München, 85764 Neuherberg, Germany; Institute for Stroke and Dementia Research, Klinikum der Universität München, Ludwig-Maximilians University Munich, 81377 Munich, Germany; Graduate School of Neuroscience (GSN), 82152 Munich, Germany
| | - Moritz Negwer
- Insititute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | - Luciano Hoeher
- Insititute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | - Hanno Steinke
- Institute of Anatomy, University of Leipzig, 04109 Leipzig, Germany
| | - Alina Dima
- Center for Translational Cancer Research (TranslaTUM) of the TUM, 81675 Munich, Germany; Image-Based Biomedical Modeling, Department of Informatics, Technical University of Munich, 85748 Garching, Germany
| | - Basavdatta Gupta
- Insititute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | - Doris Kaltenecker
- Institute for Stroke and Dementia Research, Klinikum der Universität München, Ludwig-Maximilians University Munich, 81377 Munich, Germany; Institute for Diabetes and Cancer, Helmholz Zentrum München, 85764 Neuherberg, Germany
| | - Özüm Sehnaz Caliskan
- Institute for Diabetes and Obesity, Helmholz Zentrum München, 85764 Neuherberg, Germany; German Center for Diabetes Research, Helmholz Zentrum München, 85764 Neuherberg, Germany
| | - Daniel Brandt
- Institute for Diabetes and Obesity, Helmholz Zentrum München, 85764 Neuherberg, Germany; German Center for Diabetes Research, Helmholz Zentrum München, 85764 Neuherberg, Germany
| | - Natalie Krahmer
- Institute for Diabetes and Obesity, Helmholz Zentrum München, 85764 Neuherberg, Germany; German Center for Diabetes Research, Helmholz Zentrum München, 85764 Neuherberg, Germany
| | - Stephan Müller
- German Center for Neurodegenerative Diseases (DZNE), 81377 Munich, Germany; Neuroproteomics, School of Medicine, Klinikum Rechts der Isar, Technical University of Munich, 81675 Munich, Germany
| | - Stefan Frieder Lichtenthaler
- Graduate School of Neuroscience (GSN), 82152 Munich, Germany; Munich Cluster for Systems Neurology (SyNergy), 81377 Munich, Germany; German Center for Neurodegenerative Diseases (DZNE), 81377 Munich, Germany; Neuroproteomics, School of Medicine, Klinikum Rechts der Isar, Technical University of Munich, 81675 Munich, Germany
| | - Farida Hellal
- Insititute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Zentrum München, 85764 Neuherberg, Germany; Institute for Stroke and Dementia Research, Klinikum der Universität München, Ludwig-Maximilians University Munich, 81377 Munich, Germany
| | - Ingo Bechmann
- Institute of Anatomy, University of Leipzig, 04109 Leipzig, Germany
| | - Bjoern Menze
- Center for Translational Cancer Research (TranslaTUM) of the TUM, 81675 Munich, Germany; Image-Based Biomedical Modeling, Department of Informatics, Technical University of Munich, 85748 Garching, Germany; Department for Quantitative Biomedicine, University of Zurich, 8006 Zurich, Switzerland
| | - Fabian Theis
- Institute of Computational Biology, Helmholz Zentrum München, 85764 Neuherberg, Germany; TUM School of Life Sciences Weihenstephan, Technical University of Munich, 85354 Freising, Germany; Department of Mathematics, Technical University of Munich, 85748 Garching, Germany
| | - Matthias Mann
- Department for Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, 82152 Martinsried, Germany; NNF Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen, Denmark.
| | - Ali Ertürk
- Insititute for Tissue Engineering and Regenerative Medicine (iTERM), Helmholtz Zentrum München, 85764 Neuherberg, Germany; Institute for Stroke and Dementia Research, Klinikum der Universität München, Ludwig-Maximilians University Munich, 81377 Munich, Germany; Graduate School of Neuroscience (GSN), 82152 Munich, Germany; Munich Cluster for Systems Neurology (SyNergy), 81377 Munich, Germany.
| |
Collapse
|
25
|
Toth T, Bauer D, Sukosd F, Horvath P. Fisheye transformation enhances deep-learning-based single-cell phenotyping by including cellular microenvironment. CELL REPORTS METHODS 2022; 2:100339. [PMID: 36590690 PMCID: PMC9795324 DOI: 10.1016/j.crmeth.2022.100339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 08/22/2022] [Accepted: 10/21/2022] [Indexed: 11/23/2022]
Abstract
Incorporating information about the surroundings can have a significant impact on successfully determining the class of an object. This is of particular interest when determining the phenotypes of cells, for example, in the context of high-throughput screens. We hypothesized that an ideal approach would consider the fully featured view of the cell of interest, include its neighboring microenvironment, and give lesser weight to cells that are far from the cell of interest. To satisfy these criteria, we present an approach with a transformation similar to those characteristic of fisheye cameras. Using this transformation with proper settings, we could significantly increase the accuracy of single-cell phenotyping, both in the case of cell culture and tissue-based microscopy images, and we present improved results on a dataset containing images of wild animals.
Collapse
Affiliation(s)
- Timea Toth
- Synthetic and Systems Biology Unit, Biological Research Centre, Eötvös Loránd Research Network, Szeged, Hungary
- Doctoral School of Biology, University of Szeged, Szeged, Hungary
| | - David Bauer
- Synthetic and Systems Biology Unit, Biological Research Centre, Eötvös Loránd Research Network, Szeged, Hungary
| | - Farkas Sukosd
- Department of Pathology, University of Szeged, Szeged, Hungary
| | - Peter Horvath
- Synthetic and Systems Biology Unit, Biological Research Centre, Eötvös Loránd Research Network, Szeged, Hungary
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Single-Cell Technologies, Inc., Szeged, Hungary
| |
Collapse
|
26
|
Mou M, Pan Z, Lu M, Sun H, Wang Y, Luo Y, Zhu F. Application of Machine Learning in Spatial Proteomics. J Chem Inf Model 2022; 62:5875-5895. [PMID: 36378082 DOI: 10.1021/acs.jcim.2c01161] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Spatial proteomics is an interdisciplinary field that investigates the localization and dynamics of proteins, and it has gained extensive attention in recent years, especially the subcellular proteomics. Numerous evidence indicate that the subcellular localization of proteins is associated with various cellular processes and disease progression. Mass spectrometry (MS)-based and imaging-based experimental approaches have been developed to acquire large-scale spatial proteomic data. To allow the reliable analysis of increasingly complex spatial proteomics data, machine learning (ML) methods have been widely used in both MS-based and imaging-based spatial proteomic data analysis pipelines. Here, we comprehensively survey the applications of ML in spatial proteomics from following aspects: (1) data resources for spatial proteome are comprehensively introduced; (2) the roles of different ML algorithms in data analysis pipelines are elaborated; (3) successful applications of spatial proteomics and several analytical tools integrating ML methods are presented; (4) challenges existing in modern ML-based spatial proteomics studies are discussed. This review provides guidelines for researchers seeking to apply ML methods to analyze spatial proteomic data and can facilitate insightful understanding of cell biology as well as the future research in medical and drug discovery communities.
Collapse
Affiliation(s)
- Minjie Mou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Mingkun Lu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Huaicheng Sun
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yunxia Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
27
|
Artificial intelligence in science: An emerging general method of invention. RESEARCH POLICY 2022. [DOI: 10.1016/j.respol.2022.104604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
28
|
Crook OM, Davies CTR, Breckels LM, Christopher JA, Gatto L, Kirk PDW, Lilley KS. Inferring differential subcellular localisation in comparative spatial proteomics using BANDLE. Nat Commun 2022; 13:5948. [PMID: 36216816 PMCID: PMC9550814 DOI: 10.1038/s41467-022-33570-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 09/20/2022] [Indexed: 11/08/2022] Open
Abstract
The steady-state localisation of proteins provides vital insight into their function. These localisations are context specific with proteins translocating between different subcellular niches upon perturbation of the subcellular environment. Differential localisation, that is a change in the steady-state subcellular location of a protein, provides a step towards mechanistic insight of subcellular protein dynamics. High-accuracy high-throughput mass spectrometry-based methods now exist to map the steady-state localisation and re-localisation of proteins. Here, we describe a principled Bayesian approach, BANDLE, that uses these data to compute the probability that a protein differentially localises upon cellular perturbation. Extensive simulation studies demonstrate that BANDLE reduces the number of both type I and type II errors compared to existing approaches. Application of BANDLE to several datasets recovers well-studied translocations. In an application to cytomegalovirus infection, we obtain insights into the rewiring of the host proteome. Integration of other high-throughput datasets allows us to provide the functional context of these data.
Collapse
Affiliation(s)
- Oliver M Crook
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, CB2 1GA, Cambridge, UK.
- MRC Biostatistics Unit, School of Clinical Medicine, University of Cambridge, Cambridge, UK.
- Milner Therapeutics Institute, Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge, CB2 0AW, UK.
| | - Colin T R Davies
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, CB2 1GA, Cambridge, UK
- Milner Therapeutics Institute, Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge, CB2 0AW, UK
- Mechanistic Biology and Profiling, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
| | - Lisa M Breckels
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, CB2 1GA, Cambridge, UK
- Milner Therapeutics Institute, Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge, CB2 0AW, UK
| | - Josie A Christopher
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, CB2 1GA, Cambridge, UK
- Milner Therapeutics Institute, Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge, CB2 0AW, UK
| | - Laurent Gatto
- de Duve Institute, Université catholique de Louvain, Avenue Hippocrate 75, 1200, Brussels, Belgium
| | - Paul D W Kirk
- MRC Biostatistics Unit, School of Clinical Medicine, University of Cambridge, Cambridge, UK
- Cambridge Institute of Therapeutic Immunology & Infectious Disease (CITIID), Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge, Cambridge, UK
| | - Kathryn S Lilley
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, CB2 1GA, Cambridge, UK.
- Milner Therapeutics Institute, Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge, CB2 0AW, UK.
| |
Collapse
|
29
|
Analysis of the Human Protein Atlas Weakly Supervised Single-Cell Classification competition. Nat Methods 2022; 19:1221-1229. [PMID: 36175767 PMCID: PMC9550622 DOI: 10.1038/s41592-022-01606-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Accepted: 08/10/2022] [Indexed: 11/09/2022]
Abstract
While spatial proteomics by fluorescence imaging has quickly become an essential discovery tool for researchers, fast and scalable methods to classify and embed single-cell protein distributions in such images are lacking. Here, we present the design and analysis of the results from the competition Human Protein Atlas - Single-Cell Classification hosted on the Kaggle platform. This represents a crowd-sourced competition to develop machine learning models trained on limited annotations to label single-cell protein patterns in fluorescent images. The particular challenges of this competition include class imbalance, weak labels and multi-label classification, prompting competitors to apply a wide range of approaches in their solutions. The winning models serve as the first subcellular omics tools that can annotate single-cell locations, extract single-cell features and capture cellular dynamics.
Collapse
|
30
|
|
31
|
Wang S, Linsley JW, Linsley DA, Lamstein J, Finkbeiner S. Fluorescently labeled nuclear morphology is highly informative of neurotoxicity. FRONTIERS IN TOXICOLOGY 2022; 4:935438. [PMID: 36093369 PMCID: PMC9449453 DOI: 10.3389/ftox.2022.935438] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 07/27/2022] [Indexed: 11/16/2022] Open
Abstract
Neurotoxicity can be detected in live microscopy by morphological changes such as retraction of neurites, fragmentation, blebbing of the neuronal soma and ultimately the disappearance of fluorescently labeled neurons. However, quantification of these features is often difficult, low-throughput, and imprecise due to the overreliance on human curation. Recently, we showed that convolutional neural network (CNN) models can outperform human curators in the assessment of neuronal death from images of fluorescently labeled neurons, suggesting that there is information within the images that indicates toxicity but that is not apparent to the human eye. In particular, the CNN's decision strategy indicated that information within the nuclear region was essential for its superhuman performance. Here, we systematically tested this prediction by comparing images of fluorescent neuronal morphology from nuclear-localized fluorescent protein to those from freely diffused fluorescent protein for classifying neuronal death. We found that biomarker-optimized (BO-) CNNs could learn to classify neuronal death from fluorescent protein-localized nuclear morphology (mApple-NLS-CNN) alone, with super-human accuracy. Furthermore, leveraging methods from explainable artificial intelligence, we identified novel features within the nuclear-localized fluorescent protein signal that were indicative of neuronal death. Our findings suggest that the use of a nuclear morphology marker in live imaging combined with computational models such mApple-NLS-CNN can provide an optimal readout of neuronal death, a common result of neurotoxicity.
Collapse
Affiliation(s)
- Shijie Wang
- Center for Systems and Therapeutics, Gladstone Institutes, San Francisco, CA, United States
| | - Jeremy W. Linsley
- Center for Systems and Therapeutics, Gladstone Institutes, San Francisco, CA, United States
| | - Drew A. Linsley
- Robert J. and Nancy D. Carney Institute for Brain Science, Brown University, Providence, RI, United States
- Department of Cognitive, Linguistic and Psychological Sciences, Brown University, Providence, RI, United States
| | - Josh Lamstein
- Center for Systems and Therapeutics, Gladstone Institutes, San Francisco, CA, United States
| | - Steven Finkbeiner
- Center for Systems and Therapeutics, Gladstone Institutes, San Francisco, CA, United States
- Taube/Koret Center for Neurodegenerative Disease, Gladstone Institutes, San Francisco, CA, United States
- Departments of Neurology and Physiology, University of California, San Francisco, San Francisco, CA, United States
- Neuroscience Graduate Program, University of California, San Francisco, San Francisco, CA, United States
- Biomedical Sciences Graduate Program, University of California, San Francisco, San Francisco, CA, United States
| |
Collapse
|
32
|
Hajiabadi H, Mamontova I, Prizak R, Pancholi A, Koziolek A, Hilbert L. Deep-learning microscopy image reconstruction with quality control reveals second-scale rearrangements in RNA polymerase II clusters. PNAS NEXUS 2022; 1:pgac065. [PMID: 36741438 PMCID: PMC9896941 DOI: 10.1093/pnasnexus/pgac065] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Accepted: 05/17/2022] [Indexed: 02/07/2023]
Abstract
Fluorescence microscopy, a central tool of biological research, is subject to inherent trade-offs in experiment design. For instance, image acquisition speed can only be increased in exchange for a lowered signal quality, or for an increased rate of photo-damage to the specimen. Computational denoising can recover some loss of signal, extending the trade-off margin for high-speed imaging. Recently proposed denoising on the basis of neural networks shows exceptional performance but raises concerns of errors typical of neural networks. Here, we present a work-flow that supports an empirically optimized reduction of exposure times, as well as per-image quality control to exclude images with reconstruction errors. We implement this work-flow on the basis of the denoising tool Noise2Void and assess the molecular state and 3D shape of RNA polymerase II (Pol II) clusters in live zebrafish embryos. Image acquisition speed could be tripled, achieving 2-s time resolution and 350-nm lateral image resolution. The obtained data reveal stereotyped events of approximately 10 s duration: initially, the molecular mark for recruited Pol II increases, then the mark for active Pol II increases, and finally Pol II clusters take on a stretched and unfolded shape. An independent analysis based on fixed sample images reproduces this sequence of events, and suggests that they are related to the transient association of genes with Pol II clusters. Our work-flow consists of procedures that can be implemented on commercial fluorescence microscopes without any hardware or software modification, and should, therefore, be transferable to many other applications.
Collapse
Affiliation(s)
| | | | - Roshan Prizak
- Institute of Biological and Chemical Systems, Department of Biological Information Processing, Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany
| | - Agnieszka Pancholi
- Institute of Biological and Chemical Systems, Department of Biological Information Processing, Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany
| | | | | |
Collapse
|
33
|
Andreasson JOL, Gotrik MR, Wu MJ, Wayment-Steele HK, Kladwang W, Portela F, Wellington-Oguri R, Das R, Greenleaf WJ. Crowdsourced RNA design discovers diverse, reversible, efficient, self-contained molecular switches. Proc Natl Acad Sci U S A 2022; 119:e2112979119. [PMID: 35471911 PMCID: PMC9170038 DOI: 10.1073/pnas.2112979119] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Accepted: 03/09/2022] [Indexed: 01/26/2023] Open
Abstract
Internet-based scientific communities promise a means to apply distributed, diverse human intelligence toward previously intractable scientific problems. However, current implementations have not allowed communities to propose experiments to test all emerging hypotheses at scale or to modify hypotheses in response to experiments. We report high-throughput methods for molecular characterization of nucleic acids that enable the large-scale video game–based crowdsourcing of RNA sensor design, followed by high-throughput functional characterization. Iterative design testing of thousands of crowdsourced RNA sensor designs produced near–thermodynamically optimal and reversible RNA switches that act as self-contained molecular sensors and couple five distinct small molecule inputs to three distinct protein binding and fluorogenic outputs. This work suggests a paradigm for widely distributed experimental bioscience.
Collapse
Affiliation(s)
- Johan O. L. Andreasson
- Department of Genetics, Stanford University School of Medicine, Stanford University, Stanford, CA 94305
- Department of Biochemistry, Stanford University School of Medicine, Stanford University, Stanford, CA 94305
| | - Michael R. Gotrik
- Department of Biochemistry, Stanford University School of Medicine, Stanford University, Stanford, CA 94305
| | - Michelle J. Wu
- Biomedical Informatics Training Program, Stanford University School of Medicine, Stanford University, Stanford, CA 94305
| | | | - Wipapat Kladwang
- Department of Biochemistry, Stanford University School of Medicine, Stanford University, Stanford, CA 94305
| | - Fernando Portela
- Department of Biochemistry, Stanford University School of Medicine, Stanford University, Stanford, CA 94305
- Eterna Massive Open Laboratory
| | - Roger Wellington-Oguri
- Department of Biochemistry, Stanford University School of Medicine, Stanford University, Stanford, CA 94305
- Eterna Massive Open Laboratory
| | | | - Rhiju Das
- Department of Biochemistry, Stanford University School of Medicine, Stanford University, Stanford, CA 94305
- Biomedical Informatics Training Program, Stanford University School of Medicine, Stanford University, Stanford, CA 94305
- Department of Physics, Stanford University, Stanford, CA 94305
| | - William J. Greenleaf
- Department of Genetics, Stanford University School of Medicine, Stanford University, Stanford, CA 94305
- Department of Applied Physics, Stanford University, Stanford, CA 94305
- Chan-Zuckerberg Biohub, San Francisco, CA
| |
Collapse
|
34
|
Butyaev A, Drogaris C, Tremblay-Savard O, Waldispühl J. Human-supervised clustering of multidimensional data using crowdsourcing. ROYAL SOCIETY OPEN SCIENCE 2022; 9:211189. [PMID: 35620007 PMCID: PMC9128850 DOI: 10.1098/rsos.211189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 04/29/2022] [Indexed: 06/15/2023]
Abstract
Clustering is a central task in many data analysis applications. However, there is no universally accepted metric to decide the occurrence of clusters. Ultimately, we have to resort to a consensus between experts. The problem is amplified with high-dimensional datasets where classical distances become uninformative and the ability of humans to fully apprehend the distribution of the data is challenged. In this paper, we design a mobile human-computing game as a tool to query human perception for the multidimensional data clustering problem. We propose two clustering algorithms that partially or entirely rely on aggregated human answers and report the results of two experiments conducted on synthetic and real-world datasets. We show that our methods perform on par or better than the most popular automated clustering algorithms. Our results suggest that hybrid systems leveraging annotations of partial datasets collected through crowdsourcing platforms can be an efficient strategy to capture the collective wisdom for solving abstract computational problems.
Collapse
|
35
|
Nam S, Kim D, Jung W, Zhu Y. Understanding the Research Landscape of Deep Learning in Biomedical Science: Scientometric Analysis. J Med Internet Res 2022; 24:e28114. [PMID: 35451980 PMCID: PMC9077503 DOI: 10.2196/28114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Revised: 05/30/2021] [Accepted: 02/20/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Advances in biomedical research using deep learning techniques have generated a large volume of related literature. However, there is a lack of scientometric studies that provide a bird's-eye view of them. This absence has led to a partial and fragmented understanding of the field and its progress. OBJECTIVE This study aimed to gain a quantitative and qualitative understanding of the scientific domain by analyzing diverse bibliographic entities that represent the research landscape from multiple perspectives and levels of granularity. METHODS We searched and retrieved 978 deep learning studies in biomedicine from the PubMed database. A scientometric analysis was performed by analyzing the metadata, content of influential works, and cited references. RESULTS In the process, we identified the current leading fields, major research topics and techniques, knowledge diffusion, and research collaboration. There was a predominant focus on applying deep learning, especially convolutional neural networks, to radiology and medical imaging, whereas a few studies focused on protein or genome analysis. Radiology and medical imaging also appeared to be the most significant knowledge sources and an important field in knowledge diffusion, followed by computer science and electrical engineering. A coauthorship analysis revealed various collaborations among engineering-oriented and biomedicine-oriented clusters of disciplines. CONCLUSIONS This study investigated the landscape of deep learning research in biomedicine and confirmed its interdisciplinary nature. Although it has been successful, we believe that there is a need for diverse applications in certain areas to further boost the contributions of deep learning in addressing biomedical research problems. We expect the results of this study to help researchers and communities better align their present and future work.
Collapse
Affiliation(s)
- Seojin Nam
- Department of Library and Information Science, Sungkyunkwan University, Seoul, Republic of Korea
| | - Donghun Kim
- Department of Library and Information Science, Sungkyunkwan University, Seoul, Republic of Korea
| | - Woojin Jung
- Department of Library and Information Science, Sungkyunkwan University, Seoul, Republic of Korea
| | - Yongjun Zhu
- Department of Library and Information Science, Yonsei University, Seoul, Republic of Korea
| |
Collapse
|
36
|
Chu Y, Li P, Bai Y, Hu Z, Chen Y, Lu J. Group channel pruning and spatial attention distilling for object detection. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03293-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
37
|
Cho NH, Cheveralls KC, Brunner AD, Kim K, Michaelis AC, Raghavan P, Kobayashi H, Savy L, Li JY, Canaj H, Kim JY, Stewart EM, Gnann C, McCarthy F, Cabrera JP, Brunetti RM, Chhun BB, Dingle G, Hein MY, Huang B, Mehta SB, Weissman JS, Gómez-Sjöberg R, Itzhak DN, Royer LA, Mann M, Leonetti MD. OpenCell: Endogenous tagging for the cartography of human cellular organization. Science 2022; 375:eabi6983. [PMID: 35271311 PMCID: PMC9119736 DOI: 10.1126/science.abi6983] [Citation(s) in RCA: 184] [Impact Index Per Article: 92.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Elucidating the wiring diagram of the human cell is a central goal of the postgenomic era. We combined genome engineering, confocal live-cell imaging, mass spectrometry, and data science to systematically map the localization and interactions of human proteins. Our approach provides a data-driven description of the molecular and spatial networks that organize the proteome. Unsupervised clustering of these networks delineates functional communities that facilitate biological discovery. We found that remarkably precise functional information can be derived from protein localization patterns, which often contain enough information to identify molecular interactions, and that RNA binding proteins form a specific subgroup defined by unique interaction and localization properties. Paired with a fully interactive website (opencell.czbiohub.org), our work constitutes a resource for the quantitative cartography of human cellular organization.
Collapse
Affiliation(s)
| | | | - Andreas-David Brunner
- Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Kibeom Kim
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - André C. Michaelis
- Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | | | | | - Laura Savy
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Jason Y. Li
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Hera Canaj
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | | | | | - Christian Gnann
- Chan Zuckerberg Biohub, San Francisco, CA, USA
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH-Royal Institute of Technology, Stockholm, Sweden
| | | | | | - Rachel M. Brunetti
- Department of Biochemistry and Biophysics, University of California, San Francisco, CA, USA
| | | | - Greg Dingle
- Chan Zuckerberg Initiative, Redwood City, CA, USA
| | | | - Bo Huang
- Chan Zuckerberg Biohub, San Francisco, CA, USA
- Department of Biochemistry and Biophysics, University of California, San Francisco, CA, USA
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA, USA
| | | | - Jonathan S. Weissman
- Whitehead Institute, Koch Institute, Howard Hughes Medical Institute, and Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, CA, USA
| | | | | | | | - Matthias Mann
- Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
- NNF Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | | |
Collapse
|
38
|
Winfree S. User-Accessible Machine Learning Approaches for Cell Segmentation and Analysis in Tissue. Front Physiol 2022; 13:833333. [PMID: 35360226 PMCID: PMC8960722 DOI: 10.3389/fphys.2022.833333] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Accepted: 01/12/2022] [Indexed: 11/28/2022] Open
Abstract
Advanced image analysis with machine and deep learning has improved cell segmentation and classification for novel insights into biological mechanisms. These approaches have been used for the analysis of cells in situ, within tissue, and confirmed existing and uncovered new models of cellular microenvironments in human disease. This has been achieved by the development of both imaging modality specific and multimodal solutions for cellular segmentation, thus addressing the fundamental requirement for high quality and reproducible cell segmentation in images from immunofluorescence, immunohistochemistry and histological stains. The expansive landscape of cell types-from a variety of species, organs and cellular states-has required a concerted effort to build libraries of annotated cells for training data and novel solutions for leveraging annotations across imaging modalities and in some cases led to questioning the requirement for single cell demarcation all together. Unfortunately, bleeding-edge approaches are often confined to a few experts with the necessary domain knowledge. However, freely available, and open-source tools and libraries of trained machine learning models have been made accessible to researchers in the biomedical sciences as software pipelines, plugins for open-source and free desktop and web-based software solutions. The future holds exciting possibilities with expanding machine learning models for segmentation via the brute-force addition of new training data or the implementation of novel network architectures, the use of machine and deep learning in cell and neighborhood classification for uncovering cellular microenvironments, and the development of new strategies for the use of machine and deep learning in biomedical research.
Collapse
Affiliation(s)
- Seth Winfree
- Department of Pathology and Microbiology, University of Nebraska Medical Center, Omaha, NE, United States
| |
Collapse
|
39
|
Shen Z, Zhang Q, Han K, Huang DS. A Deep Learning Model for RNA-Protein Binding Preference Prediction Based on Hierarchical LSTM and Attention Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:753-762. [PMID: 32750884 DOI: 10.1109/tcbb.2020.3007544] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Attention mechanism has the ability to find important information in the sequence. The regions of the RNA sequence that can bind to proteins are more important than those that cannot bind to proteins. Neither conventional methods nor deep learning-based methods, they are not good at learning this information. In this study, LSTM is used to extract the correlation features between different sites in RNA sequence. We also use attention mechanism to evaluate the importance of different sites in RNA sequence. We get the optimal combination of k-mer length, k-mer stride window, k-mer sentence length, k-mer sentence stride window, and optimization function through hyper-parm experiments. The results show that the performance of our method is better than other methods. We tested the effects of changes in k-mer vector length on model performance. We show model performance changes under various k-mer related parameter settings. Furthermore, we investigate the effect of attention mechanism and RNA structure data on model performance.
Collapse
|
40
|
Tu Y, Lei H, Shen HB, Yang Y. SIFLoc: a self-supervised pre-training method for enhancing the recognition of protein subcellular localization in immunofluorescence microscopic images. Brief Bioinform 2022; 23:6527276. [DOI: 10.1093/bib/bbab605] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 12/15/2021] [Accepted: 12/27/2021] [Indexed: 12/19/2022] Open
Abstract
Abstract
With the rapid growth of high-resolution microscopy imaging data, revealing the subcellular map of human proteins has become a central task in the spatial proteome. The cell atlas of the Human Protein Atlas (HPA) provides precious resources for recognizing subcellular localization patterns at the cell level, and the large-scale annotated data enable learning via advanced deep neural networks. However, the existing predictors still suffer from the imbalanced class distribution and the lack of labeled data for minor classes. Thus, it is necessary to develop new methods for coping with these issues. We leverage the self-supervised learning protocol to address these problems. Especially, we propose a pre-training scheme to enhance the conventional supervised learning framework called SIFLoc. The pre-training is featured by a hybrid data augmentation method and a modified contrastive loss function, aiming to learn good feature representations from microscopic images. The experiments are performed on a large-scale immunofluorescence microscopic image dataset collected from the HPA database. Using the same deep neural networks as the classifier, the model pre-trained via SIFLoc not only outperforms the model without pre-training by a large margin but also shows advantages over the state-of-the-art self-supervised learning methods. Especially, SIFLoc improves the prediction accuracy for minor organelles significantly.
Collapse
Affiliation(s)
- Yanlun Tu
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, 200240 Shanghai, China
| | - Houchao Lei
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, 200240 Shanghai, China
| | - Hong-Bin Shen
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, 200240 Shanghai, China
- Institute of Image Processing and Pattern Recognition and Key Laboratory of System Control and Information Processing, Shanghai Jiao Tong University, 200240 Shanghai, China
| | - Yang Yang
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, 200240 Shanghai, China
| |
Collapse
|
41
|
Christopher JA, Geladaki A, Dawson CS, Vennard OL, Lilley KS. Subcellular Transcriptomics and Proteomics: A Comparative Methods Review. Mol Cell Proteomics 2022; 21:100186. [PMID: 34922010 PMCID: PMC8864473 DOI: 10.1016/j.mcpro.2021.100186] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 11/16/2021] [Accepted: 12/13/2021] [Indexed: 12/23/2022] Open
Abstract
The internal environment of cells is molecularly crowded, which requires spatial organization via subcellular compartmentalization. These compartments harbor specific conditions for molecules to perform their biological functions, such as coordination of the cell cycle, cell survival, and growth. This compartmentalization is also not static, with molecules trafficking between these subcellular neighborhoods to carry out their functions. For example, some biomolecules are multifunctional, requiring an environment with differing conditions or interacting partners, and others traffic to export such molecules. Aberrant localization of proteins or RNA species has been linked to many pathological conditions, such as neurological, cancer, and pulmonary diseases. Differential expression studies in transcriptomics and proteomics are relatively common, but the majority have overlooked the importance of subcellular information. In addition, subcellular transcriptomics and proteomics data do not always colocate because of the biochemical processes that occur during and after translation, highlighting the complementary nature of these fields. In this review, we discuss and directly compare the current methods in spatial proteomics and transcriptomics, which include sequencing- and imaging-based strategies, to give the reader an overview of the current tools available. We also discuss current limitations of these strategies as well as future developments in the field of spatial -omics.
Collapse
Affiliation(s)
- Josie A Christopher
- Department of Biochemistry, Cambridge Centre for Proteomics, University of Cambridge, Cambridge, UK; Milner Therapeutics Institute, Jeffrey Cheah Biomedical Centre, Cambridge, UK
| | - Aikaterini Geladaki
- Department of Biochemistry, Cambridge Centre for Proteomics, University of Cambridge, Cambridge, UK; Department of Genetics, University of Cambridge, Cambridge, UK
| | - Charlotte S Dawson
- Department of Biochemistry, Cambridge Centre for Proteomics, University of Cambridge, Cambridge, UK; Milner Therapeutics Institute, Jeffrey Cheah Biomedical Centre, Cambridge, UK
| | - Owen L Vennard
- Department of Biochemistry, Cambridge Centre for Proteomics, University of Cambridge, Cambridge, UK; Milner Therapeutics Institute, Jeffrey Cheah Biomedical Centre, Cambridge, UK
| | - Kathryn S Lilley
- Department of Biochemistry, Cambridge Centre for Proteomics, University of Cambridge, Cambridge, UK; Milner Therapeutics Institute, Jeffrey Cheah Biomedical Centre, Cambridge, UK.
| |
Collapse
|
42
|
Wang G, Xue MQ, Shen HB, Xu YY. Learning protein subcellular localization multi-view patterns from heterogeneous data of imaging, sequence and networks. Brief Bioinform 2022; 23:6499983. [PMID: 35018423 DOI: 10.1093/bib/bbab539] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 11/03/2021] [Accepted: 11/20/2021] [Indexed: 11/13/2022] Open
Abstract
Location proteomics seeks to provide automated high-resolution descriptions of protein location patterns within cells. Many efforts have been undertaken in location proteomics over the past decades, thereby producing plenty of automated predictors for protein subcellular localization. However, most of these predictors are trained solely from high-throughput microscopic images or protein amino acid sequences alone. Unifying heterogeneous protein data sources has yet to be exploited. In this paper, we present a pipeline called sequence, image, network-based protein subcellular locator (SIN-Locator) that constructs a multi-view description of proteins by integrating multiple data types including images of protein expression in cells or tissues, amino acid sequences and protein-protein interaction networks, to classify the patterns of protein subcellular locations. Proteins were encoded by both handcrafted features and deep learning features, and multiple combining methods were implemented. Our experimental results indicated that optimal integrations can considerately enhance the classification accuracy, and the utility of SIN-Locator has been demonstrated through applying to new released proteins in the human protein atlas. Furthermore, we also investigate the contribution of different data sources and influence of partial absence of data. This work is anticipated to provide clues for reconciliation and combination of multi-source data for protein location analysis.
Collapse
Affiliation(s)
- Ge Wang
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou 510515, China.,Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China
| | - Min-Qi Xue
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou 510515, China.,Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China.,School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Ying-Ying Xu
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou 510515, China.,Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou 510515, China
| |
Collapse
|
43
|
Götz T, Göb S, Sawant S, Erick X, Wittenberg T, Schmidkonz C, Tomé A, Lang E, Ramming A. Number of necessary training examples for Neural Networks with different number of trainable parameters. J Pathol Inform 2022; 13:100114. [PMID: 36268092 PMCID: PMC9577052 DOI: 10.1016/j.jpi.2022.100114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/12/2021] [Indexed: 11/03/2022] Open
Abstract
In this work, the network complexity should be reduced with a concomitant reduction in the number of necessary training examples. The focus thus was on the dependence of proper evaluation metrics on the number of adjustable parameters of the considered deep neural network. The used data set encompassed Hematoxylin and Eosin (H&E) colored cell images provided by various clinics. We used a deep convolutional neural network to get the relation between a model’s complexity, its concomitant set of parameters, and the size of the training sample necessary to achieve a certain classification accuracy. The complexity of the deep neural networks was reduced by pruning a certain amount of filters in the network. As expected, the unpruned neural network showed best performance. The network with the highest number of trainable parameter achieved, within the estimated standard error of the optimized cross-entropy loss, best results up to 30% pruning. Strongly pruned networks are highly viable and the classification accuracy declines quickly with decreasing number of training patterns. However, up to a pruning ratio of 40%, we found a comparable performance of pruned and unpruned deep convolutional neural networks (DCNN) and densely connected convolutional networks (DCCN).
Collapse
|
44
|
Linsley JW, Linsley DA, Lamstein J, Ryan G, Shah K, Castello NA, Oza V, Kalra J, Wang S, Tokuno Z, Javaherian A, Serre T, Finkbeiner S. Superhuman cell death detection with biomarker-optimized neural networks. SCIENCE ADVANCES 2021; 7:eabf8142. [PMID: 34878844 PMCID: PMC8654296 DOI: 10.1126/sciadv.abf8142] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Accepted: 10/19/2021] [Indexed: 05/02/2023]
Abstract
Cellular events underlying neurodegenerative disease may be captured by longitudinal live microscopy of neurons. While the advent of robot-assisted microscopy has helped scale such efforts to high-throughput regimes with the statistical power to detect transient events, time-intensive human annotation is required. We addressed this fundamental limitation with biomarker-optimized convolutional neural networks (BO-CNNs): interpretable computer vision models trained directly on biosensor activity. We demonstrate the ability of BO-CNNs to detect cell death, which is typically measured by trained annotators. BO-CNNs detected cell death with superhuman accuracy and speed by learning to identify subcellular morphology associated with cell vitality, despite receiving no explicit supervision to rely on these features. These models also revealed an intranuclear morphology signal that is difficult to spot by eye and had not previously been linked to cell death, but that reliably indicates death. BO-CNNs are broadly useful for analyzing live microscopy and essential for interpreting high-throughput experiments.
Collapse
Affiliation(s)
- Jeremy W. Linsley
- Center for Systems and Therapeutics, Gladstone Institutes, San Francisco, CA 94158, USA
| | - Drew A. Linsley
- Robert J. & Nancy D. Carney Institute for Brain Science, Brown University, Providence, RI 02912, USA
- Department of Cognitive, Linguistic & Psychological Sciences, Brown University, Providence, RI 02912, USA
| | - Josh Lamstein
- Center for Systems and Therapeutics, Gladstone Institutes, San Francisco, CA 94158, USA
| | - Gennadi Ryan
- Center for Systems and Therapeutics, Gladstone Institutes, San Francisco, CA 94158, USA
| | - Kevan Shah
- Center for Systems and Therapeutics, Gladstone Institutes, San Francisco, CA 94158, USA
| | - Nicholas A. Castello
- Center for Systems and Therapeutics, Gladstone Institutes, San Francisco, CA 94158, USA
| | - Viral Oza
- Center for Systems and Therapeutics, Gladstone Institutes, San Francisco, CA 94158, USA
| | - Jaslin Kalra
- Center for Systems and Therapeutics, Gladstone Institutes, San Francisco, CA 94158, USA
| | - Shijie Wang
- Center for Systems and Therapeutics, Gladstone Institutes, San Francisco, CA 94158, USA
| | - Zachary Tokuno
- Center for Systems and Therapeutics, Gladstone Institutes, San Francisco, CA 94158, USA
| | - Ashkan Javaherian
- Center for Systems and Therapeutics, Gladstone Institutes, San Francisco, CA 94158, USA
| | - Thomas Serre
- Robert J. & Nancy D. Carney Institute for Brain Science, Brown University, Providence, RI 02912, USA
- Department of Cognitive, Linguistic & Psychological Sciences, Brown University, Providence, RI 02912, USA
| | - Steven Finkbeiner
- Center for Systems and Therapeutics, Gladstone Institutes, San Francisco, CA 94158, USA
- Taube/Koret Center for Neurodegenerative Disease, Gladstone Institutes, San Francisco, CA 94158, USA
- Departments of Neurology and Physiology, University of California, San Francisco, San Francisco, CA 94158, USA
- Neuroscience Graduate Program, University of California, San Francisco, San Francisco, CA 94158, USA
- Biomedical Sciences Graduate Program, University of California, San Francisco, San Francisco, CA 94143, USA
| |
Collapse
|
45
|
Savulescu AF, Bouilhol E, Beaume N, Nikolski M. Prediction of RNA subcellular localization: Learning from heterogeneous data sources. iScience 2021; 24:103298. [PMID: 34765919 PMCID: PMC8571491 DOI: 10.1016/j.isci.2021.103298] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
RNA subcellular localization has recently emerged as a widespread phenomenon, which may apply to the majority of RNAs. The two main sources of data for characterization of RNA localization are sequence features and microscopy images, such as obtained from single-molecule fluorescent in situ hybridization-based techniques. Although such imaging data are ideal for characterization of RNA distribution, these techniques remain costly, time-consuming, and technically challenging. Given these limitations, imaging data exist only for a limited number of RNAs. We argue that the field of RNA localization would greatly benefit from complementary techniques able to characterize location of RNA. Here we discuss the importance of RNA localization and the current methodology in the field, followed by an introduction on prediction of location of molecules. We then suggest a machine learning approach based on the integration between imaging localization data and sequence-based data to assist in characterization of RNA localization on a transcriptome level.
Collapse
Affiliation(s)
- Anca Flavia Savulescu
- Division of Chemical, Systems & Synthetic Biology, Institute for Infectious Disease & Molecular Medicine, Faculty of Health Sciences, University of Cape Town, 7925 Cape Town, South Africa
| | - Emmanuel Bouilhol
- Université de Bordeaux, Bordeaux Bioinformatics Center, Bordeaux, France
- Université de Bordeaux, CNRS, IBGC, UMR 5095, Bordeaux, France
| | - Nicolas Beaume
- Division of Medical Virology, Faculty of Health Sciences, University of Cape Town,7925 Cape Town, South Africa
| | - Macha Nikolski
- Université de Bordeaux, Bordeaux Bioinformatics Center, Bordeaux, France
- Université de Bordeaux, CNRS, IBGC, UMR 5095, Bordeaux, France
| |
Collapse
|
46
|
Almagro J, Messal HA, Zaw Thin M, van Rheenen J, Behrens A. Tissue clearing to examine tumour complexity in three dimensions. Nat Rev Cancer 2021; 21:718-730. [PMID: 34331034 DOI: 10.1038/s41568-021-00382-w] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/18/2021] [Indexed: 02/07/2023]
Abstract
The visualization of whole organs and organisms through tissue clearing and fluorescence volumetric imaging has revolutionized the way we look at biological samples. Its application to solid tumours is changing our perception of tumour architecture, revealing signalling networks and cell interactions critical in tumour progression, and provides a powerful new strategy for cancer diagnostics. This Review introduces the latest advances in tissue clearing and three-dimensional imaging, examines the challenges in clearing epithelia - the tissue of origin of most malignancies - and discusses the insights that tissue clearing has brought to cancer research, as well as the prospective applications to experimental and clinical oncology.
Collapse
Affiliation(s)
- Jorge Almagro
- Adult Stem Cell Laboratory, The Francis Crick Institute, London, UK
| | - Hendrik A Messal
- Department of Molecular Pathology, Oncode Institute, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - May Zaw Thin
- Cancer Stem Cell Laboratory, Institute of Cancer Research, London, UK
| | - Jacco van Rheenen
- Department of Molecular Pathology, Oncode Institute, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Axel Behrens
- Adult Stem Cell Laboratory, The Francis Crick Institute, London, UK.
- Cancer Stem Cell Laboratory, Institute of Cancer Research, London, UK.
- Convergence Science Centre and Division of Cancer, Department of Surgery and Cancer, Imperial College London, London, UK.
| |
Collapse
|
47
|
Hu JX, Yang Y, Xu YY, Shen HB. Incorporating label correlations into deep neural networks to classify protein subcellular location patterns in immunohistochemistry images. Proteins 2021; 90:493-503. [PMID: 34546597 DOI: 10.1002/prot.26244] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 03/16/2021] [Accepted: 09/13/2021] [Indexed: 12/17/2022]
Abstract
Analysis of protein subcellular localization is a critical part of proteomics. In recent years, as both the number and quality of microscopic images are increasing rapidly, many automated methods, especially convolutional neural networks (CNN), have been developed to predict protein subcellular location(s) based on bioimages, but their performance always suffers from some inherent properties of the problem. First, many microscopic images have non-informative or noisy sections, like unstained stroma and unspecific background, which affect the extraction of protein expression information. Second, the patterns of protein subcellular localization are very complex, as a lot of proteins locate in more than one compartment. In this study, we propose a new label-correlation enhanced deep neural network, laceDNN, to classify the subcellular locations of multi-label proteins from immunohistochemistry images. The model uses small representative patches as input to alleviate the image noise issue, and its backbone is a hybrid architecture of CNN and recurrent neural network, where the former network extracts representative image features and the latter learns the organelle dependency relationships. Our experimental results indicate that the proposed model can improve the performance of multi-label protein subcellular classification.
Collapse
Affiliation(s)
- Jin-Xian Hu
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
| | - Yang Yang
- Department of Computer Science and Engineering, Center for Brain-Like Computing and Machine Intelligence, Shanghai Jiao Tong University, Shanghai, China
| | - Ying-Ying Xu
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
| |
Collapse
|
48
|
DeepHistoClass: A Novel Strategy for Confident Classification of Immunohistochemistry Images Using Deep Learning. Mol Cell Proteomics 2021; 20:100140. [PMID: 34425263 PMCID: PMC8476775 DOI: 10.1016/j.mcpro.2021.100140] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 08/13/2021] [Accepted: 08/18/2021] [Indexed: 11/20/2022] Open
Abstract
A multitude of efforts worldwide aim to create a single-cell reference map of the human body, for fundamental understanding of human health, molecular medicine, and targeted treatment. Antibody-based proteomics using immunohistochemistry (IHC) has proven to be an excellent technology for integration with large-scale single-cell transcriptomics datasets. The golden standard for evaluation of IHC staining patterns is manual annotation, which is expensive and may lead to subjective errors. Artificial intelligence holds much promise for efficient and accurate pattern recognition, but confidence in prediction needs to be addressed. Here, the aim was to present a reliable and comprehensive framework for automated annotation of IHC images. We developed a multilabel classification of 7848 complex IHC images of human testis corresponding to 2794 unique proteins, generated as part of the Human Protein Atlas (HPA) project. Manual annotation data for eight different cell types was generated as a basis for training and testing a proposed Hybrid Bayesian Neural Network. By combining the deep learning model with a novel uncertainty metric, DeepHistoClass (DHC) Confidence Score, the average diagnostic performance improved from 86.9% to 96.3%. This metric not only reveals which images are reliably classified by the model, but can also be utilized for identification of manual annotation errors. The proposed streamlined workflow can be developed further for other tissue types in health and disease and has important implications for digital pathology initiatives or large-scale protein mapping efforts such as the HPA project. A novel method for automated annotation of immunohistochemistry images. Introduction of an uncertainty metric, the DeepHistoClass (DHC) confidence score. Increased accuracy of automated image predictions. Identification of manual annotation errors.
Collapse
|
49
|
Li J, Peng J, Jiang X, Rea AC, Peng J, Hu J. DeepLearnMOR: a deep-learning framework for fluorescence image-based classification of organelle morphology. PLANT PHYSIOLOGY 2021; 186:1786-1799. [PMID: 34618108 PMCID: PMC8331148 DOI: 10.1093/plphys/kiab223] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Accepted: 04/11/2021] [Indexed: 05/09/2023]
Abstract
The proper biogenesis, morphogenesis, and dynamics of subcellular organelles are essential to their metabolic functions. Conventional techniques for identifying, classifying, and quantifying abnormalities in organelle morphology are largely manual and time-consuming, and require specific expertise. Deep learning has the potential to revolutionize image-based screens by greatly improving their scope, speed, and efficiency. Here, we used transfer learning and a convolutional neural network (CNN) to analyze over 47,000 confocal microscopy images from Arabidopsis wild-type and mutant plants with abnormal division of one of three essential energy organelles: chloroplasts, mitochondria, or peroxisomes. We have built a deep-learning framework, DeepLearnMOR (Deep Learning of the Morphology of Organelles), which can rapidly classify image categories and identify abnormalities in organelle morphology with over 97% accuracy. Feature visualization analysis identified important features used by the CNN to predict morphological abnormalities, and visual clues helped to better understand the decision-making process, thereby validating the reliability and interpretability of the neural network. This framework establishes a foundation for future larger-scale research with broader scopes and greater data set diversity and heterogeneity.
Collapse
Affiliation(s)
- Jiying Li
- Microsoft Corporation, Redmond, Washington 98052
| | - Jinghao Peng
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China
| | - Xiaotong Jiang
- Department of Energy Plant Research Laboratory, Michigan State University, East Lansing, Michigan 48824
| | - Anne C Rea
- Department of Energy Plant Research Laboratory, Michigan State University, East Lansing, Michigan 48824
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China
| | - Jianping Hu
- Department of Energy Plant Research Laboratory, Michigan State University, East Lansing, Michigan 48824
- Author for communication:
| |
Collapse
|
50
|
Vo-Phamhi JM, Yamauchi KA, Gómez-Sjöberg R. Validation and tuning of in situ transcriptomics image processing workflows with crowdsourced annotations. PLoS Comput Biol 2021; 17:e1009274. [PMID: 34370726 PMCID: PMC8376178 DOI: 10.1371/journal.pcbi.1009274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 08/19/2021] [Accepted: 07/14/2021] [Indexed: 11/24/2022] Open
Abstract
Recent advancements in in situ methods, such as multiplexed in situ RNA hybridization and in situ RNA sequencing, have deepened our understanding of the way biological processes are spatially organized in tissues. Automated image processing and spot-calling algorithms for analyzing in situ transcriptomics images have many parameters which need to be tuned for optimal detection. Having ground truth datasets (images where there is very high confidence on the accuracy of the detected spots) is essential for evaluating these algorithms and tuning their parameters. We present a first-in-kind open-source toolkit and framework for in situ transcriptomics image analysis that incorporates crowdsourced annotations, alongside expert annotations, as a source of ground truth for the analysis of in situ transcriptomics images. The kit includes tools for preparing images for crowdsourcing annotation to optimize crowdsourced workers' ability to annotate these images reliably, performing quality control (QC) on worker annotations, extracting candidate parameters for spot-calling algorithms from sample images, tuning parameters for spot-calling algorithms, and evaluating spot-calling algorithms and worker performance. These tools are wrapped in a modular pipeline with a flexible structure that allows users to take advantage of crowdsourced annotations from any source of their choice. We tested the pipeline using real and synthetic in situ transcriptomics images and annotations from the Amazon Mechanical Turk system obtained via Quanti.us. Using real images from in situ experiments and simulated images produced by one of the tools in the kit, we studied worker sensitivity to spot characteristics and established rules for annotation QC. We explored and demonstrated the use of ground truth generated in this way for validating spot-calling algorithms and tuning their parameters, and confirmed that consensus crowdsourced annotations are a viable substitute for expert-generated ground truth for these purposes.
Collapse
Affiliation(s)
- Jenny M. Vo-Phamhi
- Chan Zuckerberg Biohub, San Francisco, California, United States of America
| | - Kevin A. Yamauchi
- Chan Zuckerberg Biohub, San Francisco, California, United States of America
| | | |
Collapse
|