1
|
Zou K, Wang S, Wang Z, Zhang Z, Yang F. HAR_Locator: a novel protein subcellular location prediction model of immunohistochemistry images based on hybrid attention modules and residual units. Front Mol Biosci 2023; 10:1171429. [PMID: 37664182 PMCID: PMC10470064 DOI: 10.3389/fmolb.2023.1171429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 08/04/2023] [Indexed: 09/05/2023] Open
Abstract
Introduction: Proteins located in subcellular compartments have played an indispensable role in the physiological function of eukaryotic organisms. The pattern of protein subcellular localization is conducive to understanding the mechanism and function of proteins, contributing to investigating pathological changes of cells, and providing technical support for targeted drug research on human diseases. Automated systems based on featurization or representation learning and classifier design have attracted interest in predicting the subcellular location of proteins due to a considerable rise in proteins. However, large-scale, fine-grained protein microscopic images are prone to trapping and losing feature information in the general deep learning models, and the shallow features derived from statistical methods have weak supervision abilities. Methods: In this work, a novel model called HAR_Locator was developed to predict the subcellular location of proteins by concatenating multi-view abstract features and shallow features, whose advanced advantages are summarized in the following three protocols. Firstly, to get discriminative abstract feature information on protein subcellular location, an abstract feature extractor called HARnet based on Hybrid Attention modules and Residual units was proposed to relieve gradient dispersion and focus on protein-target regions. Secondly, it not only improves the supervision ability of image information but also enhances the generalization ability of the HAR_Locator through concatenating abstract features and shallow features. Finally, a multi-category multi-classifier decision system based on an Artificial Neural Network (ANN) was introduced to obtain the final output results of samples by fitting the most representative result from five subset predictors. Results: To evaluate the model, a collection of 6,778 immunohistochemistry (IHC) images from the Human Protein Atlas (HPA) database was used to present experimental results, and the accuracy, precision, and recall evaluation indicators were significantly increased to 84.73%, 84.77%, and 84.70%, respectively, compared with baseline predictors.
Collapse
Affiliation(s)
- Kai Zou
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Simeng Wang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Ziqian Wang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Zhihai Zhang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| | - Fan Yang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
- Artificial Intelligence and Bioinformation Cognition Laboratory, Jiangxi Science and Technology Normal University, Nanchang, China
| |
Collapse
|
2
|
Vodovotz Y. Towards systems immunology of critical illness at scale: from single cell 'omics to digital twins. Trends Immunol 2023; 44:345-355. [PMID: 36967340 PMCID: PMC10147586 DOI: 10.1016/j.it.2023.03.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 03/06/2023] [Accepted: 03/07/2023] [Indexed: 04/05/2023]
Abstract
Single-cell 'omics methodology has yielded unprecedented insights based largely on data-centric informatics for reducing, and thus interpreting, massive datasets. In parallel, parsimonious mathematical modeling based on abstractions of pathobiology has also yielded major insights into inflammation and immunity, with these models being extended to describe multi-organ disease pathophysiology as the basis of 'digital twins' and in silico clinical trials. The integration of these distinct methods at scale can drive both basic and translational advances, especially in the context of critical illness, including diseases such as COVID-19. Here, I explore achievements and argue the challenges that are inherent to the integration of data-driven and mechanistic modeling approaches, highlighting the potential of modeling-based strategies for rational immune system reprogramming.
Collapse
Affiliation(s)
- Yoram Vodovotz
- Department of Surgery, University of Pittsburgh, Pittsburgh, PA 15213, USA; Center for Inflammation and Regeneration Modeling, McGowan Institute for Regenerative Medicine, University of Pittsburgh, Pittsburgh, PA 15219, USA; Center for Systems Immunology, University of Pittsburgh, Pittsburgh, PA 15219, USA.
| |
Collapse
|
3
|
Deep localization of subcellular protein structures from fluorescence microscopy images. Neural Comput Appl 2022. [DOI: 10.1007/s00521-021-06715-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
4
|
Cottle L, Gilroy I, Deng K, Loudovaris T, Thomas HE, Gill AJ, Samra JS, Kebede MA, Kim J, Thorn P. Machine Learning Algorithms, Applied to Intact Islets of Langerhans, Demonstrate Significantly Enhanced Insulin Staining at the Capillary Interface of Human Pancreatic β Cells. Metabolites 2021; 11:metabo11060363. [PMID: 34200432 PMCID: PMC8229564 DOI: 10.3390/metabo11060363] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 05/27/2021] [Accepted: 05/28/2021] [Indexed: 11/16/2022] Open
Abstract
Pancreatic β cells secrete the hormone insulin into the bloodstream and are critical in the control of blood glucose concentrations. β cells are clustered in the micro-organs of the islets of Langerhans, which have a rich capillary network. Recent work has highlighted the intimate spatial connections between β cells and these capillaries, which lead to the targeting of insulin secretion to the region where the β cells contact the capillary basement membrane. In addition, β cells orientate with respect to the capillary contact point and many proteins are differentially distributed at the capillary interface compared with the rest of the cell. Here, we set out to develop an automated image analysis approach to identify individual β cells within intact islets and to determine if the distribution of insulin across the cells was polarised. Our results show that a U-Net machine learning algorithm correctly identified β cells and their orientation with respect to the capillaries. Using this information, we then quantified insulin distribution across the β cells to show enrichment at the capillary interface. We conclude that machine learning is a useful analytical tool to interrogate large image datasets and analyse sub-cellular organisation.
Collapse
Affiliation(s)
- Louise Cottle
- Charles Perkins Centre, School of Medical Sciences, University of Sydney, Camperdown 2006, Australia
| | - Ian Gilroy
- School of Computer Science, University of Sydney, Camperdown 2006, Australia
| | - Kylie Deng
- Charles Perkins Centre, School of Medical Sciences, University of Sydney, Camperdown 2006, Australia
| | | | - Helen E Thomas
- St Vincent's Institute, Fitzroy 3065, Australia
- Department of Medicine, St Vincent's Hospital, University of Melbourne, Fitzroy 3065, Australia
| | - Anthony J Gill
- Northern Clinical School, University of Sydney, St Leonards 2065, Australia
- Department of Anatomical Pathology, Royal North Shore Hospital, St Leonards 2065, Australia
- Cancer Diagnosis and Pathology Research Group, Kolling Institute of Medical Research, St Leonards 2065, Australia
| | - Jaswinder S Samra
- Northern Clinical School, University of Sydney, St Leonards 2065, Australia
- Upper Gastrointestinal Surgical Unit, Royal North Shore Hospital, St Leonards 2065, Australia
| | - Melkam A Kebede
- Charles Perkins Centre, School of Medical Sciences, University of Sydney, Camperdown 2006, Australia
| | - Jinman Kim
- School of Computer Science, University of Sydney, Camperdown 2006, Australia
| | - Peter Thorn
- Charles Perkins Centre, School of Medical Sciences, University of Sydney, Camperdown 2006, Australia
| |
Collapse
|
5
|
Su R, He L, Liu T, Liu X, Wei L. Protein subcellular localization based on deep image features and criterion learning strategy. Brief Bioinform 2020; 22:6035269. [PMID: 33320936 DOI: 10.1093/bib/bbaa313] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2020] [Revised: 09/26/2020] [Accepted: 10/14/2020] [Indexed: 01/05/2023] Open
Abstract
The spatial distribution of proteome at subcellular levels provides clues for protein functions, thus is important to human biology and medicine. Imaging-based methods are one of the most important approaches for predicting protein subcellular location. Although deep neural networks have shown impressive performance in a number of imaging tasks, its application to protein subcellular localization has not been sufficiently explored. In this study, we developed a deep imaging-based approach to localize the proteins at subcellular levels. Based on deep image features extracted from convolutional neural networks (CNNs), both single-label and multi-label locations can be accurately predicted. Particularly, the multi-label prediction is quite a challenging task. Here we developed a criterion learning strategy to exploit the label-attribute relevancy and label-label relevancy. A criterion that was used to determine the final label set was automatically obtained during the learning procedure. We concluded an optimal CNN architecture that could give the best results. Besides, experiments show that compared with the hand-crafted features, the deep features present more accurate prediction with less features. The implementation for the proposed method is available at https://github.com/RanSuLab/ProteinSubcellularLocation.
Collapse
Affiliation(s)
- Ran Su
- School of Computer Software, College of Intelligence and Computing, Tianjin University, China
| | - Linlin He
- School of Computer Software, College of Intelligence and Computing, Tianjin University, China
| | - Tianling Liu
- School of Computer Software, College of Intelligence and Computing, Tianjin University, China
| | - Xiaofeng Liu
- Key Laboratory of Breast Cancer Prevention and Therapy, Ministry of Education, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center of Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin, China
| | - Leyi Wei
- School of Software, Shandong University, China
| |
Collapse
|
6
|
Schormann W, Hariharan S, Andrews DW. A reference library for assigning protein subcellular localizations by image-based machine learning. J Cell Biol 2020; 219:133635. [PMID: 31968357 PMCID: PMC7055006 DOI: 10.1083/jcb.201904090] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Revised: 09/30/2019] [Accepted: 12/15/2019] [Indexed: 12/11/2022] Open
Abstract
Confocal micrographs of EGFP fusion proteins localized at key cell organelles in murine and human cells were acquired for use as subcellular localization landmarks. For each of the respective 789,011 and 523,319 optically validated cell images, morphology and statistical features were measured. Machine learning algorithms using these features permit automated assignment of the localization of other proteins and dyes in both cell types with very high accuracy. Automated assignment of subcellular localizations for model tail-anchored proteins with randomly mutated C-terminal targeting sequences allowed the discovery of motifs responsible for targeting to mitochondria, endoplasmic reticulum, and the late secretory pathway. Analysis of directed mutants enabled refinement of these motifs and characterization of protein distributions in within cellular subcompartments.
Collapse
Affiliation(s)
- Wiebke Schormann
- Biological Sciences, Sunnybrook Research Institute, Toronto, Canada
| | | | - David W Andrews
- Biological Sciences, Sunnybrook Research Institute, Toronto, Canada.,Department of Biochemistry, University of Toronto, Toronto, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, Canada
| |
Collapse
|
7
|
Tran DH, Meunier M, Cheriet F. OrgaNet: A Robust Network for Subcellular Organelles Classification in Fluorescence Microscopy Images. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2020; 2020:1887-1890. [PMID: 33018369 DOI: 10.1109/embc44109.2020.9175162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Automatic identification of subcellular compartments of proteins in fluorescence microscopy images is an important task to quantitatively evaluate cellular processes. A common problem for the development of deep learning based classifiers is that there is only a limited number of labeled images available for training. To address this challenge, we propose a new approach for subcellular organelles classification combining an effective and efficient architecture based on a compact Convolutional Neural Network and deep embedded clustering algorithm. We validate our approach on a benchmark of HeLa cell microscopy images. The network both yields high accuracy that outperforms state of the art methods and has significantly small number of parameters. More interestingly, experimental results show that our method is strongly robust against limited labeled data for training, requiring four times less annotated data than usual while maintaining the high accuracy of 93.9%.
Collapse
|
8
|
Rahaman MM, Ahsan MA, Chen M. Data-mining Techniques for Image-based Plant Phenotypic Traits Identification and Classification. Sci Rep 2019; 9:19526. [PMID: 31862925 PMCID: PMC6925301 DOI: 10.1038/s41598-019-55609-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Accepted: 11/21/2019] [Indexed: 11/09/2022] Open
Abstract
Statistical data-mining (DM) and machine learning (ML) are promising tools to assist in the analysis of complex dataset. In recent decades, in the precision of agricultural development, plant phenomics study is crucial for high-throughput phenotyping of local crop cultivars. Therefore, integrated or a new analytical approach is needed to deal with these phenomics data. We proposed a statistical framework for the analysis of phenomics data by integrating DM and ML methods. The most popular supervised ML methods; Linear Discriminant Analysis (LDA), Random Forest (RF), Support Vector Machine with linear (SVM-l) and radial basis (SVM-r) kernel are used for classification/prediction plant status (stress/non-stress) to validate our proposed approach. Several simulated and real plant phenotype datasets were analyzed. The results described the significant contribution of the features (selected by our proposed approach) throughout the analysis. In this study, we showed that the proposed approach removed phenotype data analysis complexity, reduced computational time of ML algorithms, and increased prediction accuracy.
Collapse
Affiliation(s)
- Md Matiur Rahaman
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China.,Department of Statistics, Faculty of Science, Bangabandhu Sheikh Mujibur Rahman Science & Technology University, Gopalganj, 8100, Bangladesh
| | - Md Asif Ahsan
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Ming Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China.
| |
Collapse
|
9
|
Xiang S, Liang Q, Hu Y, Tang P, Coppola G, Zhang D, Sun W. AMC-Net: Asymmetric and multi-scale convolutional neural network for multi-label HPA classification. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2019; 178:275-287. [PMID: 31416555 DOI: 10.1016/j.cmpb.2019.07.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Revised: 06/20/2019] [Accepted: 07/06/2019] [Indexed: 06/10/2023]
Abstract
BACKGROUND AND OBJECTIVES The multi-label Human Protein Atlas (HPA) classification can yield a better understanding of human diseases and help doctors to enhance the automatic analysis of biomedical images. The existing automatic protein recognition methods have been limited to single pattern. Therefore, an automatic multi-label human protein atlas recognition system with satisfactory performance should be conducted. This work aims to build an automatic recognition system for multi-label human protein atlas classification based on deep learning. METHODS In this work, an automatic feature extraction and multi-label classification framework is proposed. Specifically, an asymmetric and multi-scale convolutional neural network is designed for HPA classification. Furthermore, this work introduces a combined loss that consists of the binary cross-entropy and F1-score losses to improve identification performance. RESULTS Rigorous experiments are conducted to estimate the proposed system. In particular, unlike the current automatic identification systems, which focus on a limited number of patterns, the proposed method is capable of classifying mixed patterns of proteins in microscope images and can handle the subcellular multi-label protein classification task including 28 subcellular localization patterns. The proposed framework based on deep convolutional neural network outperformed the existing approaches with a F1-score of 0.823, which illustrates the robustness and effectiveness of the proposed system. CONCLUSION This study proposed a high-performance recognition system for protein atlas classification based on deep learning, and it achieved an automatic multi-label human protein atlas identification framework with superior performance than previous studies.
Collapse
Affiliation(s)
- Shao Xiang
- College of Electrical and Information Engineering, Hunan University, Changsha 410082, China; Hunan Key Laboratory of Intelligent Robot Technology in Electronic Manufacturing, Hunan University, Changsha 410082, China; National Engineering Laboratory for Robot Vision Perception and Control technologies, Hunan University, Changsha 410082, China
| | - Qiaokang Liang
- College of Electrical and Information Engineering, Hunan University, Changsha 410082, China; Hunan Key Laboratory of Intelligent Robot Technology in Electronic Manufacturing, Hunan University, Changsha 410082, China; National Engineering Laboratory for Robot Vision Perception and Control technologies, Hunan University, Changsha 410082, China.
| | - Yucheng Hu
- College of Electrical and Information Engineering, Hunan University, Changsha 410082, China; Hunan Key Laboratory of Intelligent Robot Technology in Electronic Manufacturing, Hunan University, Changsha 410082, China; National Engineering Laboratory for Robot Vision Perception and Control technologies, Hunan University, Changsha 410082, China
| | - Pen Tang
- College of Electrical and Information Engineering, Hunan University, Changsha 410082, China; Hunan Key Laboratory of Intelligent Robot Technology in Electronic Manufacturing, Hunan University, Changsha 410082, China; National Engineering Laboratory for Robot Vision Perception and Control technologies, Hunan University, Changsha 410082, China
| | - Gianmarc Coppola
- Faculty of Engineering and Applied Science, University of Ontario Institute of Technology, Oshawa, Ontario, L1H 7K4, Canada
| | - Dan Zhang
- Department of Mechanical Engineering, York University, Toronto, ON M3J 1P3, Canada
| | - Wei Sun
- College of Electrical and Information Engineering, Hunan University, Changsha 410082, China; Hunan Key Laboratory of Intelligent Robot Technology in Electronic Manufacturing, Hunan University, Changsha 410082, China; National Engineering Laboratory for Robot Vision Perception and Control technologies, Hunan University, Changsha 410082, China
| |
Collapse
|
10
|
Fluorescence microscopy image classification of 2D HeLa cells based on the CapsNet neural network. Med Biol Eng Comput 2019; 57:1187-1198. [PMID: 30687900 DOI: 10.1007/s11517-018-01946-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2018] [Accepted: 12/17/2018] [Indexed: 10/27/2022]
Abstract
The development of computer technology now allows the quick and efficient automatic fluorescence microscopy generation of a large number of images of proteins in specific subcellular compartments using fluorescence microscopy. Digital image processing and pattern recognition technology can easily classify these images, identify the subcellular location of proteins, and subsequently carry out related work such as analysis and investigation of protein function. Here, based on a fluorescence microscopy 2D image dataset of HeLa cells, the CapsNet network model was used to classify ten types of images of proteins in different subcellular compartments. Capsules in the CapsNet network model were trained to capture the possibility of certain features and variants rather than to capture the characteristics of a specific variant. The capsule at the same level predicted the instantiation parameters of the higher level capsule through the transformation matrix, and the higher level capsule became active when multiple dynamic routing forecasts were consistent. Experiments show that using the CapsNet network model to classify 2D HeLa datasets can achieve higher accuracy. Graphical abstract ᅟ.
Collapse
|
11
|
Deep learning is combined with massive-scale citizen science to improve large-scale image classification. Nat Biotechnol 2018; 36:820-828. [PMID: 30125267 DOI: 10.1038/nbt.4225] [Citation(s) in RCA: 87] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2017] [Accepted: 07/19/2018] [Indexed: 01/11/2023]
Abstract
Pattern recognition and classification of images are key challenges throughout the life sciences. We combined two approaches for large-scale classification of fluorescence microscopy images. First, using the publicly available data set from the Cell Atlas of the Human Protein Atlas (HPA), we integrated an image-classification task into a mainstream video game (EVE Online) as a mini-game, named Project Discovery. Participation by 322,006 gamers over 1 year provided nearly 33 million classifications of subcellular localization patterns, including patterns that were not previously annotated by the HPA. Second, we used deep learning to build an automated Localization Cellular Annotation Tool (Loc-CAT). This tool classifies proteins into 29 subcellular localization patterns and can deal efficiently with multi-localization proteins, performing robustly across different cell types. Combining the annotations of gamers and deep learning, we applied transfer learning to create a boosted learner that can characterize subcellular protein distribution with F1 score of 0.72. We found that engaging players of commercial computer games provided data that augmented deep learning and enabled scalable and readily improved image classification.
Collapse
|
12
|
Gregoretti F, Cesarini E, Lanzuolo C, Oliva G, Antonelli L. An Automatic Segmentation Method Combining an Active Contour Model and a Classification Technique for Detecting Polycomb-group Proteinsin High-Throughput Microscopy Images. Methods Mol Biol 2016; 1480:181-197. [PMID: 27659985 DOI: 10.1007/978-1-4939-6380-5_16] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The large amount of data generated in biological experiments that rely on advanced microscopy can be handled only with automated image analysis. Most analyses require a reliable cell image segmentation eventually capable of detecting subcellular structures.We present an automatic segmentation method to detect Polycomb group (PcG) proteins areas isolated from nuclei regions in high-resolution fluorescent cell image stacks. It combines two segmentation algorithms that use an active contour model and a classification technique serving as a tool to better understand the subcellular three-dimensional distribution of PcG proteins in live cell image sequences. We obtained accurate results throughout several cell image datasets, coming from different cell types and corresponding to different fluorescent labels, without requiring elaborate adjustments to each dataset.
Collapse
Affiliation(s)
- Francesco Gregoretti
- Institute for High Performance Computing and Networking, ICAR-CNR, via Pietro Castellino 111, Naples, 80131, Italy
| | - Elisa Cesarini
- Institute of Cellular Biology and Neurobiology, IRCCS Santa Lucia Foundation, via del Fosso di Fiorano 64, Rome, 00143, Italy
| | - Chiara Lanzuolo
- Institute of Cellular Biology and Neurobiology, IRCCS Santa Lucia Foundation, via del Fosso di Fiorano 64, Rome, 00143, Italy
- Istituto Nazionale Genetica Molecolare "Romeo ed Enrica Invernizzi", via Francesco Sforza 35, Milan, 20122, Italy
| | - Gennaro Oliva
- Institute for High Performance Computing and Networking, ICAR-CNR, via Pietro Castellino 111, Naples, 80131, Italy
| | - Laura Antonelli
- Institute for High Performance Computing and Networking, ICAR-CNR, via Pietro Castellino 111, Naples, 80131, Italy.
| |
Collapse
|
13
|
Lin SL, Chang FL, Ho SY, Charoenkwan P, Wang KW, Huang HL. Predicting Neuroinflammation in Morphine Tolerance for Tolerance Therapy from Immunostaining Images of Rat Spinal Cord. PLoS One 2015; 10:e0139806. [PMID: 26437460 PMCID: PMC4593634 DOI: 10.1371/journal.pone.0139806] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2015] [Accepted: 09/17/2015] [Indexed: 02/07/2023] Open
Abstract
Long-term morphine treatment leads to tolerance which attenuates analgesic effect and hampers clinical utilization. Recent studies have sought to reveal the mechanism of opioid receptors and neuroinflammation by observing morphological changes of cells in the rat spinal cord. This work proposes a high-content screening (HCS) based computational method, HCS-Morph, for predicting neuroinflammation in morphine tolerance to facilitate the development of tolerance therapy using immunostaining images for astrocytes, microglia, and neurons in the spinal cord. HCS-Morph first extracts numerous HCS-based features of cellular phenotypes. Next, an inheritable bi-objective genetic algorithm is used to identify a minimal set of features by maximizing the prediction accuracy of neuroinflammation. Finally, a mathematic model using a support vector machine with the identified features is established to predict drug-treated images to assess the effects of tolerance therapy. The dataset consists of 15 saline controls (1 μl/h), 15 morphine-tolerant rats (15 μg/h), and 10 rats receiving a co-infusion of morphine (15 μg/h) and gabapentin (15 μg/h, Sigma). The three individual models of astrocytes, microglia, and neurons for predicting neuroinflammation yielded respective Jackknife test accuracies of 96.67%, 90.00%, and 86.67% on the 30 rats, and respective independent test accuracies of 100%, 90%, and 60% on the 10 co-infused rats. The experimental results suggest that neuroinflammation activity expresses more predominantly in astrocytes and microglia than in neuron cells. The set of features for predicting neuroinflammation from images of astrocytes comprises mean cell intensity, total cell area, and second-order geometric moment (relating to cell distribution), relevant to cell communication, cell extension, and cell migration, respectively. The present investigation provides the first evidence for the role of gabapentin in the attenuation of morphine tolerance from phenotypic changes of astrocytes and microglia. Based on neuroinflammation prediction, the proposed computer-aided image diagnosis system can greatly facilitate the development of tolerance therapy with anti-inflammatory drugs.
Collapse
Affiliation(s)
- Shinn-Long Lin
- Department of Anesthesiology, Tri-Service General Hospital and National Defense Medical Center, Taipei, Taiwan
| | - Fang-Lin Chang
- Department of Anesthesiology, Kang-Ning General Hospital, Taipei, Taiwan
| | - Shinn-Ying Ho
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
- Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan
| | - Phasit Charoenkwan
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
| | - Kuan-Wei Wang
- Institute of Molecular Medicine and Bioengineering, National Chiao Tung University, Hsinchu, Taiwan
| | - Hui-Ling Huang
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
- Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan
- * E-mail:
| |
Collapse
|
14
|
Abbas SS, Dijkstra TMH, Heskes T. A comparative study of cell classifiers for image-based high-throughput screening. BMC Bioinformatics 2014; 15:342. [PMID: 25336059 PMCID: PMC4287552 DOI: 10.1186/1471-2105-15-342] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2014] [Accepted: 09/29/2014] [Indexed: 11/24/2022] Open
Abstract
Background Millions of cells are present in thousands of images created in high-throughput screening (HTS). Biologists could classify each of these cells into a phenotype by visual inspection. But in the presence of millions of cells this visual classification task becomes infeasible. Biologists train classification models on a few thousand visually classified example cells and iteratively improve the training data by visual inspection of the important misclassified phenotypes. Classification methods differ in performance and performance evaluation time. We present a comparative study of computational performance of gentle boosting, joint boosting CellProfiler Analyst (CPA), support vector machines (linear and radial basis function) and linear discriminant analysis (LDA) on two data sets of HT29 and HeLa cancer cells. Results For the HT29 data set we find that gentle boosting, SVM (linear) and SVM (RBF) are close in performance but SVM (linear) is faster than gentle boosting and SVM (RBF). For the HT29 data set the average performance difference between SVM (RBF) and SVM (linear) is 0.42 %. For the HeLa data set we find that SVM (RBF) outperforms other classification methods and is on average 1.41 % better in performance than SVM (linear). Conclusions Our study proposes SVM (linear) for iterative improvement of the training data and SVM (RBF) for the final classifier to classify all unlabeled cells in the whole data set. Electronic supplementary material The online version of this article (doi:10.1186/1471-2105-15-342) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Syed Saiden Abbas
- Institute for Computing and Information Sciences, Radboud University, Nijmegen, Netherlands.
| | | | | |
Collapse
|
15
|
Joo D, Kwan YS, Song J, Pinho C, Hey J, Won YJ. Identification of cichlid fishes from Lake Malawi using computer vision. PLoS One 2013; 8:e77686. [PMID: 24204918 PMCID: PMC3808401 DOI: 10.1371/journal.pone.0077686] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2013] [Accepted: 09/03/2013] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND The explosively radiating evolution of cichlid fishes of Lake Malawi has yielded an amazing number of haplochromine species estimated as many as 500 to 800 with a surprising degree of diversity not only in color and stripe pattern but also in the shape of jaw and body among them. As these morphological diversities have been a central subject of adaptive speciation and taxonomic classification, such high diversity could serve as a foundation for automation of species identification of cichlids. METHODOLOGY/PRINCIPAL FINDING Here we demonstrate a method for automatic classification of the Lake Malawi cichlids based on computer vision and geometric morphometrics. For this end we developed a pipeline that integrates multiple image processing tools to automatically extract informative features of color and stripe patterns from a large set of photographic images of wild cichlids. The extracted information was evaluated by statistical classifiers Support Vector Machine and Random Forests. Both classifiers performed better when body shape information was added to the feature of color and stripe. Besides the coloration and stripe pattern, body shape variables boosted the accuracy of classification by about 10%. The programs were able to classify 594 live cichlid individuals belonging to 12 different classes (species and sexes) with an average accuracy of 78%, contrasting to a mere 42% success rate by human eyes. The variables that contributed most to the accuracy were body height and the hue of the most frequent color. CONCLUSIONS Computer vision showed a notable performance in extracting information from the color and stripe patterns of Lake Malawi cichlids although the information was not enough for errorless species identification. Our results indicate that there appears an unavoidable difficulty in automatic species identification of cichlid fishes, which may arise from short divergence times and gene flow between closely related species.
Collapse
Affiliation(s)
- Deokjin Joo
- Department of Electrical Engineering and Computer Science, Seoul National University, Seoul, Korea
| | - Ye-seul Kwan
- Division of EcoScience, Ewha Womans University, Seoul, Korea
| | - Jongwoo Song
- Department of Statistics, Ewha Womans University, Seoul, Korea
| | - Catarina Pinho
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Vairão, Portugal
| | - Jody Hey
- Department of Biology, Temple University, Philadelphia, Pennsylvania, United States of America
| | - Yong-Jin Won
- Division of EcoScience, Ewha Womans University, Seoul, Korea
| |
Collapse
|
16
|
Charoenkwan P, Hwang E, Cutler RW, Lee HC, Ko LW, Huang HL, Ho SY. HCS-Neurons: identifying phenotypic changes in multi-neuron images upon drug treatments of high-content screening. BMC Bioinformatics 2013; 14 Suppl 16:S12. [PMID: 24564437 PMCID: PMC3853092 DOI: 10.1186/1471-2105-14-s16-s12] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Background High-content screening (HCS) has become a powerful tool for drug discovery. However, the discovery of drugs targeting neurons is still hampered by the inability to accurately identify and quantify the phenotypic changes of multiple neurons in a single image (named multi-neuron image) of a high-content screen. Therefore, it is desirable to develop an automated image analysis method for analyzing multi-neuron images. Results We propose an automated analysis method with novel descriptors of neuromorphology features for analyzing HCS-based multi-neuron images, called HCS-neurons. To observe multiple phenotypic changes of neurons, we propose two kinds of descriptors which are neuron feature descriptor (NFD) of 13 neuromorphology features, e.g., neurite length, and generic feature descriptors (GFDs), e.g., Haralick texture. HCS-neurons can 1) automatically extract all quantitative phenotype features in both NFD and GFDs, 2) identify statistically significant phenotypic changes upon drug treatments using ANOVA and regression analysis, and 3) generate an accurate classifier to group neurons treated by different drug concentrations using support vector machine and an intelligent feature selection method. To evaluate HCS-neurons, we treated P19 neurons with nocodazole (a microtubule depolymerizing drug which has been shown to impair neurite development) at six concentrations ranging from 0 to 1000 ng/mL. The experimental results show that all the 13 features of NFD have statistically significant difference with respect to changes in various levels of nocodazole drug concentrations (NDC) and the phenotypic changes of neurites were consistent to the known effect of nocodazole in promoting neurite retraction. Three identified features, total neurite length, average neurite length, and average neurite area were able to achieve an independent test accuracy of 90.28% for the six-dosage classification problem. This NFD module and neuron image datasets are provided as a freely downloadable MatLab project at http://iclab.life.nctu.edu.tw/HCS-Neurons. Conclusions Few automatic methods focus on analyzing multi-neuron images collected from HCS used in drug discovery. We provided an automatic HCS-based method for generating accurate classifiers to classify neurons based on their phenotypic changes upon drug treatments. The proposed HCS-neurons method is helpful in identifying and classifying chemical or biological molecules that alter the morphology of a group of neurons in HCS.
Collapse
|
17
|
Coelho LP, Kangas JD, Naik AW, Osuna-Highley E, Glory-Afshar E, Fuhrman M, Simha R, Berget PB, Jarvik JW, Murphy RF. Determining the subcellular location of new proteins from microscope images using local features. ACTA ACUST UNITED AC 2013; 29:2343-9. [PMID: 23836142 DOI: 10.1093/bioinformatics/btt392] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Evaluation of previous systems for automated determination of subcellular location from microscope images has been done using datasets in which each location class consisted of multiple images of the same representative protein. Here, we frame a more challenging and useful problem where previously unseen proteins are to be classified. RESULTS Using CD-tagging, we generated two new image datasets for evaluation of this problem, which contain several different proteins for each location class. Evaluation of previous methods on these new datasets showed that it is much harder to train a classifier that generalizes across different proteins than one that simply recognizes a protein it was trained on. We therefore developed and evaluated additional approaches, incorporating novel modifications of local features techniques. These extended the notion of local features to exploit both the protein image and any reference markers that were imaged in parallel. With these, we obtained a large accuracy improvement in our new datasets over existing methods. Additionally, these features help achieve classification improvements for other previously studied datasets. AVAILABILITY The datasets are available for download at http://murphylab.web.cmu.edu/data/. The software was written in Python and C++ and is available under an open-source license at http://murphylab.web.cmu.edu/software/. The code is split into a library, which can be easily reused for other data and a small driver script for reproducing all results presented here. A step-by-step tutorial on applying the methods to new datasets is also available at that address. CONTACT murphy@cmu.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Luis Pedro Coelho
- Lane Center for Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Autonomous screening of C. elegans identifies genes implicated in synaptogenesis. Nat Methods 2012; 9:977-80. [PMID: 22902935 PMCID: PMC3530956 DOI: 10.1038/nmeth.2141] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2012] [Accepted: 07/27/2012] [Indexed: 12/26/2022]
Abstract
Morphometric studies in multicellular organisms are mostly performed manually because of the complexity of multidimensional features and lack of appropriate tools for handling these organisms. Here we present an integrated system to autonomously (i.e. without human supervision) identify and sort mutants with altered subcellular traits in real-time. We performed self-directed screens of synapse formation 100× faster and found both novel genes and phenotypic classes previously unidentified in extensive manual screens.
Collapse
|
19
|
Zhao CH, Zhang BL, Zhang XZ, Zhao SQ, Li HX. Recognition of driving postures by combined features and random subspace ensemble of multilayer perceptron classifiers. Neural Comput Appl 2012. [DOI: 10.1007/s00521-012-1057-4] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
20
|
Hochbaum DS, Hsu CN, Yang YT. Ranking of multidimensional drug profiling data by fractional-adjusted bi-partitional scores. Bioinformatics 2012; 28:i106-14. [PMID: 22689749 PMCID: PMC3371864 DOI: 10.1093/bioinformatics/bts232] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Motivation: The recent development of high-throughput drug profiling (high content screening or HCS) provides a large amount of quantitative multidimensional data. Despite its potentials, it poses several challenges for academia and industry analysts alike. This is especially true for ranking the effectiveness of several drugs from many thousands of images directly. This paper introduces, for the first time, a new framework for automatically ordering the performance of drugs, called fractional adjusted bi-partitional score (FABS). This general strategy takes advantage of graph-based formulations and solutions and avoids many shortfalls of traditionally used methods in practice. We experimented with FABS framework by implementing it with a specific algorithm, a variant of normalized cut—normalized cut prime (FABS-NC′), producing a ranking of drugs. This algorithm is known to run in polynomial time and therefore can scale well in high-throughput applications. Results: We compare the performance of FABS-NC′ to other methods that could be used for drugs ranking. We devise two variants of the FABS algorithm: FABS-SVM that utilizes support vector machine (SVM) as black box, and FABS-Spectral that utilizes the eigenvector technique (spectral) as black box. We compare the performance of FABS-NC′ also to three other methods that have been previously considered: center ranking (Center), PCA ranking (PCA), and graph transition energy method (GTEM). The conclusion is encouraging: FABS-NC′ consistently outperforms all these five alternatives. FABS-SVM has the second best performance among these six methods, but is far behind FABS-NC′: In some cases FABS-NC′ produces over half correctly predicted ranking experiment trials than FABS-SVM. Availability: The system and data for the evaluation reported here will be made available upon request to the authors after this manuscript is accepted for publication. Contact:yxy128@berkeley.edu
Collapse
Affiliation(s)
- Dorit S Hochbaum
- Department of Industrial Engineering and Operations Research, University of California, Berkeley, CA 94720, USA
| | | | | |
Collapse
|
21
|
Jacques RM, Fieller NR, Ainscow EK. A classification updating procedure motivated by high-content screening data. J Appl Stat 2012. [DOI: 10.1080/02664763.2011.580335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
22
|
Nanni L, Lumini A. Ensemble of Neural Networks for Automated Cell Phenotype Image Classification. Mach Learn 2012. [DOI: 10.4018/978-1-60960-818-7.ch405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Subcellular location is related to the knowledge of the spatial distribution of a protein within the cell. The knowledge of the location of all proteins is crucial for several applications ranging from early diagnosis of a disease to monitoring of therapeutic effectiveness of drugs. This chapter focuses on the study of machine learning techniques for cell phenotype image classification and is aimed at pointing out some of the advantages of using a multi-classifier system instead of a stand-alone method to solve this difficult classification problem. The main problems and solutions proposed in this field are discussed and a new approach is proposed based on ensemble of neural networks trained by local and global features. Finally, the most used benchmarks for this problem are presented and an experimental comparison among several state-of-the-art approaches is reported which allows to quantify the performance improvement obtained by the approach proposed in this chapter.
Collapse
|
23
|
SHAMIR L. Assessing the efficacy of low-level image content descriptors for computer-based fluorescence microscopy image analysis. J Microsc 2011; 243:284-92. [DOI: 10.1111/j.1365-2818.2011.03502.x] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
24
|
Zhang B, Pham TD. Phenotype recognition with combined features and random subspace classifier ensemble. BMC Bioinformatics 2011; 12:128. [PMID: 21529372 PMCID: PMC3098787 DOI: 10.1186/1471-2105-12-128] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2010] [Accepted: 04/30/2011] [Indexed: 11/24/2022] Open
Abstract
Background Automated, image based high-content screening is a fundamental tool for discovery in biological science. Modern robotic fluorescence microscopes are able to capture thousands of images from massively parallel experiments such as RNA interference (RNAi) or small-molecule screens. As such, efficient computational methods are required for automatic cellular phenotype identification capable of dealing with large image data sets. In this paper we investigated an efficient method for the extraction of quantitative features from images by combining second order statistics, or Haralick features, with curvelet transform. A random subspace based classifier ensemble with multiple layer perceptron (MLP) as the base classifier was then exploited for classification. Haralick features estimate image properties related to second-order statistics based on the grey level co-occurrence matrix (GLCM), which has been extensively used for various image processing applications. The curvelet transform has a more sparse representation of the image than wavelet, thus offering a description with higher time frequency resolution and high degree of directionality and anisotropy, which is particularly appropriate for many images rich with edges and curves. A combined feature description from Haralick feature and curvelet transform can further increase the accuracy of classification by taking their complementary information. We then investigate the applicability of the random subspace (RS) ensemble method for phenotype classification based on microscopy images. A base classifier is trained with a RS sampled subset of the original feature set and the ensemble assigns a class label by majority voting. Results Experimental results on the phenotype recognition from three benchmarking image sets including HeLa, CHO and RNAi show the effectiveness of the proposed approach. The combined feature is better than any individual one in the classification accuracy. The ensemble model produces better classification performance compared to the component neural networks trained. For the three images sets HeLa, CHO and RNAi, the Random Subspace Ensembles offers the classification rates 91.20%, 98.86% and 91.03% respectively, which compares sharply with the published result 84%, 93% and 82% from a multi-purpose image classifier WND-CHARM which applied wavelet transforms and other feature extraction methods. We investigated the problem of estimation of ensemble parameters and found that satisfactory performance improvement could be brought by a relative medium dimensionality of feature subsets and small ensemble size. Conclusions The characteristics of curvelet transform of being multiscale and multidirectional suit the description of microscopy images very well. It is empirically demonstrated that the curvelet-based feature is clearly preferred to wavelet-based feature for bioimage descriptions. The random subspace ensemble of MLPs is much better than a number of commonly applied multi-class classifiers in the investigated application of phenotype recognition.
Collapse
Affiliation(s)
- Bailing Zhang
- Department of Computer Science and Software Engineering, Xi'an Jiaotong-Liverpool University, Suzhou, 215123, PR China.
| | | |
Collapse
|
25
|
Peng T, Murphy RF. Image-derived, three-dimensional generative models of cellular organization. Cytometry A 2011; 79:383-91. [PMID: 21472848 DOI: 10.1002/cyto.a.21066] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2010] [Revised: 03/04/2011] [Accepted: 03/14/2011] [Indexed: 02/01/2023]
Abstract
Given the importance of subcellular location to protein function, computational simulations of cell behaviors will ultimately require the ability to model the distributions of proteins within organelles and other structures. Toward this end, statistical learning methods have previously been used to build models of sets of two-dimensional microscope images, where each set contains multiple images for a single subcellular location pattern. The model learned from each set of images not only represents the pattern but also captures the variation in that pattern from cell to cell. The models consist of sub-models for nuclear shape, cell shape, organelle size and shape, and organelle distribution relative to nuclear and cell boundaries, and allow synthesis of images with the expectation that they are drawn from the same underlying statistical distribution as the images used to train them. Here we extend this generative models approach to three dimensions using a similar framework, permitting protein subcellular locations to be described more accurately. Models of different patterns can be combined to yield a synthetic multi-channel image containing as many proteins as desired, something that is difficult to obtain by direct microscope imaging for more than a few proteins. In addition, the model parameters represent a more compact and interpretable way of communicating subcellular patterns than descriptive image features and may be particularly effective for automated identification of changes in subcellular organization caused by perturbagens.
Collapse
Affiliation(s)
- Tao Peng
- Center for Bioimage Informatics, and Department of Biomedical Engineering, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, Pennsylvania 15213, USA
| | | |
Collapse
|
26
|
Lin YS, Lin CC, Tsai YS, Ku TC, Huang YH, Hsu CN. A spectral graph theoretic approach to quantification and calibration of collective morphological differences in cell images. ACTA ACUST UNITED AC 2010; 26:i29-37. [PMID: 20529919 PMCID: PMC2881379 DOI: 10.1093/bioinformatics/btq194] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Motivation: High-throughput image-based assay technologies can rapidly produce a large number of cell images for drug screening, but data analysis is still a major bottleneck that limits their utility. Quantifying a wide variety of morphological differences observed in cell images under different drug influences is still a challenging task because the result can be highly sensitive to sampling and noise. Results: We propose a graph-based approach to cell image analysis. We define graph transition energy to quantify morphological differences between image sets. A spectral graph theoretic regularization is applied to transform the feature space based on training examples of extremely different images to calibrate the quantification. Calibration is essential for a practical quantification method because we need to measure the confidence of the quantification. We applied our method to quantify the degree of partial fragmentation of mitochondria in collections of fluorescent cell images. We show that with transformation, the quantification can be more accurate and sensitive than that without transformation. We also show that our method outperforms competing methods, including neighbourhood component analysis and the multi-variate drug profiling method by Loo et al. We illustrate its utility with a study of Annonaceous acetogenins, a family of compounds with drug potential. Our result reveals that squamocin induces more fragmented mitochondria than muricin A. Availability: Mitochondrial cell images, their corresponding feature sets (SSLF and WSLF) and the source code of our proposed method are available at http://aiia.iis.sinica.edu.tw/. Contact:chunnan@iis.sinica.edu.tw Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yu-Shi Lin
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | | | | | | | | | | |
Collapse
|
27
|
Abstract
Chemical address tags can be defined as specific structural features shared by a set of bioimaging probes having a predictable influence on cell-associated visual signals obtained from these probes. Here, using a large image dataset acquired with a high content screening instrument, machine vision and cheminformatics analysis have been applied to reveal chemical address tags. With a combinatorial library of fluorescent molecules, fluorescence signal intensity, spectral, and spatial features characterizing each one of the probes' visual signals were extracted from images acquired with the three different excitation and emission channels of the imaging instrument. With multivariate regression, the additive contribution from each one of the different building blocks of the bioimaging probes toward each measured, cell-associated image-based feature was calculated. In this manner, variations in the chemical features of the molecules were associated with the resulting staining patterns, facilitating quantitative, objective analysis of chemical address tags. Hierarchical clustering and paired image-cheminformatics analysis revealed key structure-property relationships amongst many building blocks of the fluorescent molecules. The results point to different chemical modifications of the bioimaging probes that can exert similar (or different) effects on the probes' visual signals. Inspection of the clustered structures suggests intramolecular charge migration or partial charge distribution as potential mechanistic determinants of chemical address tag behavior.
Collapse
Affiliation(s)
- Kerby Shedden
- Department of Statistics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | | |
Collapse
|
28
|
Wang W, Ozolek JA, Rohde GK. Detection and classification of thyroid follicular lesions based on nuclear structure from histopathology images. Cytometry A 2010; 77:485-94. [PMID: 20099247 DOI: 10.1002/cyto.a.20853] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Follicular lesions of the thyroid are traditionally difficult and tedious challenges in diagnostic surgical pathology in part due to lack of obvious discriminatory cytological and microarchitectural features. We describe a computerized method to detect and classify follicular adenoma of the thyroid, follicular carcinoma of the thyroid, and normal thyroid based on the nuclear chromatin distribution from digital images of tissue obtained by routine histological methods. Our method is based on determining whether a set of nuclei, obtained from histological images using automated image segmentation, is most similar to sets of nuclei obtained from normal or diseased tissues. This comparison is performed utilizing numerical features, a support vector machine, and a simple voting strategy. We also describe novel methods to identify unique and defining chromatin patterns pertaining to each class. Unlike previous attempts in detecting and classifying these thyroid lesions using computational imaging, our results show that our method can automatically classify the data pertaining to 10 different human cases with 100% accuracy after blind cross validation using at most 43 nuclei randomly selected from each patient. We conclude that nuclear structure alone contains enough information to automatically classify the normal thyroid, follicular carcinoma, and follicular adenoma, as long as groups of nuclei (instead of individual ones) are used. We also conclude that the distribution of nuclear size and chromatin concentration (how tightly packed it is) seem to be discriminating features between nuclei of follicular adenoma, follicular carcinoma, and normal thyroid.
Collapse
Affiliation(s)
- Wei Wang
- Center for Bioimage Informatics, Department of Biomedical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | | | | |
Collapse
|
29
|
Zhang Z, Ge Y, Zhang D, Zhou X. High-content analysis in monastrol suppressor screens. A neural network-based classification approach. Methods Inf Med 2010; 50:265-72. [PMID: 20602002 DOI: 10.3414/me09-01-0030] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2009] [Accepted: 03/22/2010] [Indexed: 11/09/2022]
Abstract
OBJECTIVES High-content screening (HCS) via automated fluorescent microscopy is a powerful technology for the effective expression of cellular processes. However, HCS will generally produce tremendous image datasets, which leads to difficulties of handling and analyzing. We proposed an automatic classification approach for simultaneous feature extraction and cell phenotype recognition of monoaster and bipolar cells in HCS system. METHODS The proposed approach was composed of image segmentation, feature extraction, and classification. The image segmentation was based on the Laplacian of Gaussian (LoG) edge detection method. For the reduction of noise effect on cellular images, we employed an adaptive threshold in microtubule channel. The principal component analysis was used in the feature selection process. The classification was performed with a back-propagation neural network (BPNN). Using the current approach, the cell phases were distinguished from three-channel acquisitions of cellular images and the numbers of bipolar and monoaster cells were automatically counted. RESULTS The validity of this approach was examined by the application of screening the response of drug compounds in suppressing Monastrol. Our results indicate that the proposed algorithm could improve the recognition rates of monoaster and bipolar cells to 97.98% and 93.12%, respectively, compared with 97.02% and 86.96% obtained from the same samples by multi-phenotypic mitotic analysis (MMA). CONCLUSIONS We have shown that BPNN is a valuable tool to classify cell phenotype. To further improve the classification performance, more test data, more optimized feature selection approaches, and advanced classifier may be required and will be investigated in future works.
Collapse
Affiliation(s)
- Z Zhang
- Institute of Acoustics, Key Lab of Modern Acoustics, MOE, Nanjing University, Nanjing, China
| | | | | | | |
Collapse
|
30
|
Hu Y, Osuna-Highley E, Hua J, Nowicki TS, Stolz R, McKayle C, Murphy RF. Automated analysis of protein subcellular location in time series images. Bioinformatics 2010; 26:1630-6. [PMID: 20484328 DOI: 10.1093/bioinformatics/btq239] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Image analysis, machine learning and statistical modeling have become well established for the automatic recognition and comparison of the subcellular locations of proteins in microscope images. By using a comprehensive set of features describing static images, major subcellular patterns can be distinguished with near perfect accuracy. We now extend this work to time series images, which contain both spatial and temporal information. The goal is to use temporal features to improve recognition of protein patterns that are not fully distinguishable by their static features alone. RESULTS We have adopted and designed five sets of features for capturing temporal behavior in 2D time series images, based on object tracking, temporal texture, normal flow, Fourier transforms and autoregression. Classification accuracy on an image collection for 12 fluorescently tagged proteins was increased when temporal features were used in addition to static features. Temporal texture, normal flow and Fourier transform features were most effective at increasing classification accuracy. We therefore extended these three feature sets to 3D time series images, but observed no significant improvement over results for 2D images. The methods for 2D and 3D temporal pattern analysis do not require segmentation of images into single cell regions, and are suitable for automated high-throughput microscopy applications. AVAILABILITY Images, source code and results will be available upon publication at http://murphylab.web.cmu.edu/software CONTACT murphy@cmu.edu.
Collapse
Affiliation(s)
- Yanhua Hu
- Center for Bioimage Informatics, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | | | | | | | | | | | | |
Collapse
|
31
|
Banerjee AK, M S, M N, Murty US. Classification and clustering analysis of pyruvate dehydrogenase enzyme based on their physicochemical properties. Bioinformation 2010; 4:456-62. [PMID: 20975910 PMCID: PMC2951700 DOI: 10.6026/97320630004456] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2010] [Revised: 03/02/2010] [Accepted: 04/09/2010] [Indexed: 11/23/2022] Open
Abstract
Biological systems are highly organized and enormously coordinated maintaining greater complexity. The increment of secondary data generation and progress of modern mining techniques provided us an opportunity to discover hidden intra and inter relations among these non linear dataset. This will help in understanding the complex biological phenomenon with greater efficiency. In this paper we report comparative classification of Pyruvate Dehydrogenase protein sequences from bacterial sources based on 28 different physicochemical parameters (such as bulkiness, hydrophobicity, total positively and negatively charged residues, α helices, β strand etc.) and 20 type amino acid compositions. Logistic, MLP (Multi Layer Perceptron), SMO (Sequential Minimal Optimization), RBFN (Radial Basis Function Network) and SL (simple logistic) methods were compared in this study. MLP was found to be the best method with maximum average accuracy of 88.20%. Same dataset was subjected for clustering using 2*2 grid of a two dimensional SOM (Self Organizing Maps). Clustering analysis revealed the proximity of the unannotated sequences with the Mycobacterium and Synechococcus genus.
Collapse
Affiliation(s)
- Amit Kumar Banerjee
- Bioinformatics Group, Biology Division, Indian Institute of Chemical Technology, Hyderabad-500607, A.P, India
| | | | | | | |
Collapse
|
32
|
Nanni L, Lumini A, Brahnam S. Local binary patterns variants as texture descriptors for medical image analysis. Artif Intell Med 2010; 49:117-25. [PMID: 20338737 DOI: 10.1016/j.artmed.2010.02.006] [Citation(s) in RCA: 153] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2009] [Revised: 02/23/2010] [Accepted: 02/27/2010] [Indexed: 11/28/2022]
Abstract
OBJECTIVE This paper focuses on the use of image-based machine learning techniques in medical image analysis. In particular, we present some variants of local binary patterns (LBP), which are widely considered the state of the art among texture descriptors. After we provide a detailed review of the literature about existing LBP variants and discuss the most salient approaches, along with their pros and cons, we report new experiments using several LBP-based descriptors and propose a set of novel texture descriptors for the representation of biomedical images. The standard LBP operator is defined as a gray-scale invariant texture measure, derived from a general definition of texture in a local neighborhood. Our variants are obtained by considering different shapes for the neighborhood calculation and different encodings for the evaluation of the local gray-scale difference. These sets of features are then used for training a machine-learning classifier (a stand-alone support vector machine). METHODS AND MATERIALS Extensive experiments are conducted using the following three datasets: RESULTS AND CONCLUSION Our results show that the novel variant named elongated quinary patterns (EQP) is a very performing method among those proposed in this work for extracting information from a texture in all the tested datasets. EQP is based on an elliptic neighborhood and a 5 levels scale for encoding the local gray-scale difference. Particularly interesting are the results on the widely studied 2D-HeLa dataset, where, to the best of our knowledge, the proposed descriptor obtains the highest performance among all the several texture descriptors tested in the literature.
Collapse
Affiliation(s)
- Loris Nanni
- Department of Electronic, Informatics and Systems, Università di Bologna, Via Venezia 52, 47023 Cesena, Italy.
| | | | | |
Collapse
|
33
|
Cheng J, Veronika M, Rajapakse JC. Identifying Cells in Histopathological Images. RECOGNIZING PATTERNS IN SIGNALS, SPEECH, IMAGES AND VIDEOS 2010. [DOI: 10.1007/978-3-642-17711-8_25] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
34
|
Zhu S, Matsudaira P, Welsch R, Rajapakse JC. Quantification of Cytoskeletal Protein Localization from High-Content Images. ACTA ACUST UNITED AC 2010. [DOI: 10.1007/978-3-642-16001-1_25] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/19/2023]
|
35
|
Novel features for automated cell phenotype image classification. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2010; 680:207-13. [PMID: 20865503 DOI: 10.1007/978-1-4419-5913-3_24] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The most common method of handling automated cell phenotype image classification is to determine a common set of optimal features and then apply standard machine-learning algorithms to classify them. In this chapter, we use advanced methods for determining a set of optimized features for training an ensemble using random subspace with a set of Levenberg-Marquardt neural networks. The process requires that we first run several experiments to determine the individual features that offer the most information. The best performing features are then concatenated and used in the ensemble classification. Applying this approach, we have obtained an average accuracy of 97.4% using the three best benchmarks for this problem: the 2D HeLa dataset and both the endogenous and the transfected LOCATE mouse protein subcellular localization databases.
Collapse
|
36
|
Harder N, Mora-Bermúdez F, Godinez WJ, Wünsche A, Eils R, Ellenberg J, Rohr K. Automatic analysis of dividing cells in live cell movies to detect mitotic delays and correlate phenotypes in time. Genome Res 2009; 19:2113-24. [PMID: 19797680 DOI: 10.1101/gr.092494.109] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Live-cell imaging allows detailed dynamic cellular phenotyping for cell biology and, in combination with small molecule or drug libraries, for high-content screening. Fully automated analysis of live cell movies has been hampered by the lack of computational approaches that allow tracking and recognition of individual cell fates over time in a precise manner. Here, we present a fully automated approach to analyze time-lapse movies of dividing cells. Our method dynamically categorizes cells into seven phases of the cell cycle and five aberrant morphological phenotypes over time. It reliably tracks cells and their progeny and can thus measure the length of mitotic phases and detect cause and effect if mitosis goes awry. We applied our computational scheme to annotate mitotic phenotypes induced by RNAi gene knockdown of CKAP5 (also known as ch-TOG) or by treatment with the drug nocodazole. Our approach can be readily applied to comparable assays aiming at uncovering the dynamic cause of cell division phenotypes.
Collapse
Affiliation(s)
- Nathalie Harder
- University of Heidelberg, IPMB, BIOQUANT, and DKFZ Heidelberg, Department of Bioinformatics and Functional Genomics, Biomedical Computer Vision Group, D-69120 Heidelberg, Germany
| | | | | | | | | | | | | |
Collapse
|
37
|
Pinidiyaarachchi A, Zieba A, Allalou A, Pardali K, Wählby C. A detailed analysis of 3D subcellular signal localization. Cytometry A 2009; 75:319-28. [PMID: 19006073 DOI: 10.1002/cyto.a.20663] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Detection and localization of fluorescent signals in relation to other subcellular structures is an important task in various biological studies. Many methods for analysis of fluorescence microscopy image data are limited to 2D. As cells are in fact 3D structures, there is a growing need for robust methods for analysis of 3D data. This article presents an approach for detecting point-like fluorescent signals and analyzing their subnuclear position. Cell nuclei are delineated using marker-controlled (seeded) 3D watershed segmentation. User-defined object and background seeds are given as input, and gradient information defines merging and splitting criteria. Point-like signals are detected using a modified stable wave detector and localized in relation to the nuclear membrane using distance shells. The method was applied to a set of biological data studying the localization of Smad2-Smad4 protein complexes in relation to the nuclear membrane. Smad complexes appear as early as 1 min after stimulation while the highest signal concentration is observed 45 min after stimulation, followed by a concentration decrease. The robust 3D signal detection and concentration measures obtained using the proposed method agree with previous observations while also revealing new information regarding the complex formation.
Collapse
|
38
|
Lipshtat A, Neves SR, Iyengar R. Specification of spatial relationships in directed graphs of cell signaling networks. Ann N Y Acad Sci 2009; 1158:44-56. [PMID: 19348631 DOI: 10.1111/j.1749-6632.2008.03748.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Graph theory provides a useful and powerful tool for the analysis of cellular signaling networks. Intracellular components such as cytoplasmic signaling proteins, transcription factors, and genes are connected by links, representing various types of chemical interactions that result in functional consequences. However, these graphs lack important information regarding the spatial distribution of cellular components. The ability of two cellular components to interact depends not only on their mutual chemical affinity but also on colocalization to the same subcellular region. Localization of components is often used as a regulatory mechanism to achieve specific effects in response to different receptor signals. Here we describe an approach for incorporating spatial distribution into graphs and for the development of mixed graphs where links are specified by mutual chemical affinity as well as colocalization. We suggest that such mixed graphs will provide more accurate descriptions of functional cellular networks and their regulatory capabilities and aid in the development of large-scale predictive models of cellular behavior.
Collapse
Affiliation(s)
- Azi Lipshtat
- Department of Pharmacology and Systems Therapeutics, Mount Sinai School of Medicine, New York, New York, USA
| | | | | |
Collapse
|
39
|
Vizeacoumar FJ, Chong Y, Boone C, Andrews BJ. A picture is worth a thousand words: Genomics to phenomics in the yeastSaccharomyces cerevisiae. FEBS Lett 2009; 583:1656-61. [DOI: 10.1016/j.febslet.2009.03.068] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2009] [Revised: 03/26/2009] [Accepted: 03/31/2009] [Indexed: 11/28/2022]
|
40
|
Newberg J, Hua J, Murphy RF. Location proteomics: systematic determination of protein subcellular location. Methods Mol Biol 2009; 500:313-332. [PMID: 19399439 DOI: 10.1007/978-1-59745-525-1_11] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Proteomics seeks the systematic and comprehensive understanding of all aspects of proteins, and location proteomics is the relatively new subfield of proteomics concerned with the location of proteins within cells. This review provides a guide to the widening selection of methods for studying location proteomics and integrating the results into systems biology. Automated and objective methods for determining protein subcellular location have been described based on extracting numerical features from fluorescence microscope images and applying machine learning approaches to them. Systems to recognize all major protein subcellular location patterns in both two-dimensional and three-dimensional HeLa cell images with high accuracy (over 95% and 98%, respectively) have been built. The feasibility of objectively grouping proteins into subcellular location families, and in the process of discovering new subcellular patterns, has been demonstrated using cluster analysis of images from a library of randomly tagged protein clones. Generative models can be built to effectively capture and communicate the patterns in these families. While automated methods for high-resolution determination of subcellular location are now available, the task of applying these methods to all expressed proteins in many different cell types under many conditions represents a very significant challenge.
Collapse
Affiliation(s)
- Justin Newberg
- Department of Biomedical Engineering and Center for Bioimage Informatics, Carnegie Mellon University, Pittsburg, PA, USA
| | | | | |
Collapse
|
41
|
Wei N, Flaschel E, Friehs K, Nattkemper TW. A machine vision system for automated non-invasive assessment of cell viability via dark field microscopy, wavelet feature selection and classification. BMC Bioinformatics 2008; 9:449. [PMID: 18939996 PMCID: PMC2582244 DOI: 10.1186/1471-2105-9-449] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2007] [Accepted: 10/21/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Cell viability is one of the basic properties indicating the physiological state of the cell, thus, it has long been one of the major considerations in biotechnological applications. Conventional methods for extracting information about cell viability usually need reagents to be applied on the targeted cells. These reagent-based techniques are reliable and versatile, however, some of them might be invasive and even toxic to the target cells. In support of automated noninvasive assessment of cell viability, a machine vision system has been developed. RESULTS This system is based on supervised learning technique. It learns from images of certain kinds of cell populations and trains some classifiers. These trained classifiers are then employed to evaluate the images of given cell populations obtained via dark field microscopy. Wavelet decomposition is performed on the cell images. Energy and entropy are computed for each wavelet subimage as features. A feature selection algorithm is implemented to achieve better performance. Correlation between the results from the machine vision system and commonly accepted gold standards becomes stronger if wavelet features are utilized. The best performance is achieved with a selected subset of wavelet features. CONCLUSION The machine vision system based on dark field microscopy in conjugation with supervised machine learning and wavelet feature selection automates the cell viability assessment, and yields comparable results to commonly accepted methods. Wavelet features are found to be suitable to describe the discriminative properties of the live and dead cells in viability classification. According to the analysis, live cells exhibit morphologically more details and are intracellularly more organized than dead ones, which display more homogeneous and diffuse gray values throughout the cells. Feature selection increases the system's performance. The reason lies in the fact that feature selection plays a role of excluding redundant or misleading information that may be contained in the raw data, and leads to better results.
Collapse
Affiliation(s)
- Ning Wei
- Bielefeld University, Faculty of Technology, Fermentation Engineering, PO-Box 100131, 33501 Bielefeld, Germany.
| | | | | | | |
Collapse
|
42
|
An incremental approach to automated protein localisation. BMC Bioinformatics 2008; 9:445. [PMID: 18937856 PMCID: PMC2603336 DOI: 10.1186/1471-2105-9-445] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2008] [Accepted: 10/20/2008] [Indexed: 11/30/2022] Open
Abstract
Background The subcellular localisation of proteins in intact living cells is an important means for gaining information about protein functions. Even dynamic processes can be captured, which can barely be predicted based on amino acid sequences. Besides increasing our knowledge about intracellular processes, this information facilitates the development of innovative therapies and new diagnostic methods. In order to perform such a localisation, the proteins under analysis are usually fused with a fluorescent protein. So, they can be observed by means of a fluorescence microscope and analysed. In recent years, several automated methods have been proposed for performing such analyses. Here, two different types of approaches can be distinguished: techniques which enable the recognition of a fixed set of protein locations and methods that identify new ones. To our knowledge, a combination of both approaches – i.e. a technique, which enables supervised learning using a known set of protein locations and is able to identify and incorporate new protein locations afterwards – has not been presented yet. Furthermore, associated problems, e.g. the recognition of cells to be analysed, have usually been neglected. Results We introduce a novel approach to automated protein localisation in living cells. In contrast to well-known techniques, the protein localisation technique presented in this article aims at combining the two types of approaches described above: After an automatic identification of unknown protein locations, a potential user is enabled to incorporate them into the pre-trained system. An incremental neural network allows the classification of a fixed set of protein location as well as the detection, clustering and incorporation of additional patterns that occur during an experiment. Here, the proposed technique achieves promising results with respect to both tasks. In addition, the protein localisation procedure has been adapted to an existing cell recognition approach. Therefore, it is especially well-suited for high-throughput investigations where user interactions have to be avoided. Conclusion We have shown that several aspects required for developing an automatic protein localisation technique – namely the recognition of cells, the classification of protein distribution patterns into a set of learnt protein locations, and the detection and learning of new locations – can be combined successfully. So, the proposed method constitutes a crucial step to render image-based protein localisation techniques amenable to large-scale experiments.
Collapse
|
43
|
A reliable method for cell phenotype image classification. Artif Intell Med 2008; 43:87-97. [DOI: 10.1016/j.artmed.2008.03.005] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2007] [Revised: 02/28/2008] [Accepted: 03/10/2008] [Indexed: 11/19/2022]
|
44
|
Newberg J, Murphy RF. A framework for the automated analysis of subcellular patterns in human protein atlas images. J Proteome Res 2008; 7:2300-8. [PMID: 18435555 DOI: 10.1021/pr7007626] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The systematic study of subcellular location patterns is required to fully characterize the human proteome, as subcellular location provides critical context necessary for understanding a protein's function. The analysis of tens of thousands of expressed proteins for the many cell types and cellular conditions under which they may be found creates a need for automated subcellular pattern analysis. We therefore describe the application of automated methods, previously developed and validated by our laboratory on fluorescence micrographs of cultured cell lines, to analyze subcellular patterns in tissue images from the Human Protein Atlas. The Atlas currently contains images of over 3000 protein patterns in various human tissues obtained using immunohistochemistry. We chose a 16 protein subset from the Atlas that reflects the major classes of subcellular location. We then separated DNA and protein staining in the images, extracted various features from each image, and trained a support vector machine classifier to recognize the protein patterns. Our results show that our system can distinguish the patterns with 83% accuracy in 45 different tissues, and when only the most confident classifications are considered, this rises to 97%. These results are encouraging given that the tissues contain many different cell types organized in different manners, and that the Atlas images are of moderate resolution. The approach described is an important starting point for automatically assigning subcellular locations on a proteome-wide basis for collections of tissue images such as the Atlas.
Collapse
Affiliation(s)
- Justin Newberg
- Center for Bioimage Informatics, and Departments of Biological Sciences, Biomedical Engineering, and Machine Learning, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15217, USA
| | | |
Collapse
|
45
|
Automated, systematic determination of protein subcellular location using fluorescence microscopy. Subcell Biochem 2008; 43:263-76. [PMID: 17953398 DOI: 10.1007/978-1-4020-5943-8_12] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Proteomics is the comprehensive study of all aspects of protein behavior. The subfield of location proteomics is concerned with the systematic analysis of the subcellular location of proteins. In order to perform high-resolution, high-throughput analysis of all protein location patterns, automation is needed both for acquisition and analysis. Automated methods for analyzing subcellular location patterns in fluorescence microscope images have been developed and shown to work well for static 2D and 3D images of single cells. This chapter reviews this work and describes current efforts to extend these approaches, including classification of temporal patterns and building of generative models to represent location patterns.
Collapse
|
46
|
König I, Malley J, Pajevic S, Weimar C, Diener HC, Ziegler A. Patient-centered yes/no prognosis using learning machines. INT J DATA MIN BIOIN 2008; 2:289-341. [PMID: 19216340 PMCID: PMC2754835 DOI: 10.1504/ijdmb.2008.022149] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
In the last 15 years several machine learning approaches have been developed for classification and regression. In an intuitive manner we introduce the main ideas of classification and regression trees, support vector machines, bagging, boosting and random forests. We discuss differences in the use of machine learning in the biomedical community and the computer sciences. We propose methods for comparing machines on a sound statistical basis. Data from the German Stroke Study Collaboration is used for illustration. We compare the results from learning machines to those obtained by a published logistic regression and discuss similarities and differences.
Collapse
Affiliation(s)
- I.R. König
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Ratzeburger Allee 160, 23538 Lübeck, Germany
| | - J.D. Malley
- Center for Information Technology, National Institutes of Health, Bethesda, MD, USA
| | - S. Pajevic
- Center for Information Technology, National Institutes of Health, Bethesda, MD, USA
| | - C. Weimar
- Klinik und Poliklinik für Neurologie, Universität Duisburg-Essen, Germany
| | - H-C. Diener
- Klinik und Poliklinik für Neurologie, Universität Duisburg-Essen, Germany
| | - A. Ziegler
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Ratzeburger Allee 160, 23538 Lübeck, Germany, E-mail:
| |
Collapse
|
47
|
Barbe L, Lundberg E, Oksvold P, Stenius A, Lewin E, Björling E, Asplund A, Pontén F, Brismar H, Uhlén M, Andersson-Svahn H. Toward a confocal subcellular atlas of the human proteome. Mol Cell Proteomics 2007; 7:499-508. [PMID: 18029348 DOI: 10.1074/mcp.m700325-mcp200] [Citation(s) in RCA: 113] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Information on protein localization on the subcellular level is important to map and characterize the proteome and to better understand cellular functions of proteins. Here we report on a pilot study of 466 proteins in three human cell lines aimed to allow large scale confocal microscopy analysis using protein-specific antibodies. Approximately 3000 high resolution images were generated, and more than 80% of the analyzed proteins could be classified in one or multiple subcellular compartment(s). The localizations of the proteins showed, in many cases, good agreement with the Gene Ontology localization prediction model. This is the first large scale antibody-based study to localize proteins into subcellular compartments using antibodies and confocal microscopy. The results suggest that this approach might be a valuable tool in conjunction with predictive models for protein localization.
Collapse
Affiliation(s)
- Laurent Barbe
- Department of Biotechnology, AlbaNova University Center, Royal Institute of Technology, SE-106 91 Stockholm, Sweden
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
48
|
Lin CC, Tsai YS, Lin YS, Chiu TY, Hsiung CC, Lee MI, Simpson JC, Hsu CN. Boosting multiclass learning with repeating codes and weak detectors for protein subcellular localization. Bioinformatics 2007; 23:3374-81. [DOI: 10.1093/bioinformatics/btm497] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
49
|
Marée R, Geurts P, Wehenkel L. Random subwindows and extremely randomized trees for image classification in cell biology. BMC Cell Biol 2007; 8 Suppl 1:S2. [PMID: 17634092 PMCID: PMC1924507 DOI: 10.1186/1471-2121-8-s1-s2] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Background With the improvements in biosensors and high-throughput image acquisition technologies, life science laboratories are able to perform an increasing number of experiments that involve the generation of a large amount of images at different imaging modalities/scales. It stresses the need for computer vision methods that automate image classification tasks. Results We illustrate the potential of our image classification method in cell biology by evaluating it on four datasets of images related to protein distributions or subcellular localizations, and red-blood cell shapes. Accuracy results are quite good without any specific pre-processing neither domain knowledge incorporation. The method is implemented in Java and available upon request for evaluation and research purpose. Conclusion Our method is directly applicable to any image classification problems. We foresee the use of this automatic approach as a baseline method and first try on various biological image classification problems.
Collapse
Affiliation(s)
- Raphaël Marée
- GIGA Bioinformatics Platform, University of Liege, B34 Avenue de l'Hopital 1, Liege, 4000, Belgium
- Bioinformatics and Modeling, Department of Electrical Engineering and Computer Science & GIGA Research, University of Liege, B28 Grande Traverse 10, Liege, 4000, Belgium
| | - Pierre Geurts
- Bioinformatics and Modeling, Department of Electrical Engineering and Computer Science & GIGA Research, University of Liege, B28 Grande Traverse 10, Liege, 4000, Belgium
| | - Louis Wehenkel
- Bioinformatics and Modeling, Department of Electrical Engineering and Computer Science & GIGA Research, University of Liege, B28 Grande Traverse 10, Liege, 4000, Belgium
| |
Collapse
|
50
|
Starkuviene V, Pepperkok R. The potential of high-content high-throughput microscopy in drug discovery. Br J Pharmacol 2007; 152:62-71. [PMID: 17603554 PMCID: PMC1978277 DOI: 10.1038/sj.bjp.0707346] [Citation(s) in RCA: 96] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Fluorescence microscopy is a powerful method to study protein function in its natural habitat, the living cell. With the availability of the green fluorescent protein and its spectral variants, almost any gene of interest can be fluorescently labelled in living cells opening the possibility to study protein localization, dynamics and interactions. The emergence of automated cellular systems allows rapid visualization of large groups of cells and phenotypic analysis in a quantitative manner. Here, we discuss recent advances in high-content high-throughput microscopy and its potential application to several steps of the drug discovery process.
Collapse
Affiliation(s)
- V Starkuviene
- Cell Biology and Cell Biophysics Unit, European Molecular Biology Laboratory (EMBL), Meyerhofstrasse 1, 69117 Heidelberg, Germany.
| | | |
Collapse
|