Wang F, Wei L. Multi-scale deep learning for the imbalanced multi-label protein subcellular localization prediction based on immunohistochemistry images.
Bioinformatics 2022;
38:2602-2611. [PMID:
35212728 DOI:
10.1093/bioinformatics/btac123]
[Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 02/09/2022] [Accepted: 02/24/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION
The development of microscopic imaging techniques enables us to study protein subcellular locations from the tissue level down to the cell level, contributing to the rapid development of image-based protein subcellular location prediction approaches. However, existing methods suffer from intrinsic limitations, such as poor feature representation ability, data imbalanced issue, and multi-label classification problem, greatly impacting the model performance and generalization.
RESULTS
In this study, we propose MSTLoc, a novel multi-scale end-to-end deep learning model to identify protein subcellular locations in the imbalanced multi-label immunohistochemistry (IHC) images dataset. In our MSTLoc, we deploy a deep convolution neural network to extract multi-scale features from the IHC images, aggregate the high-level features and low-level features via feature fusion to sufficiently exploit the dependencies amongst various subcellular locations, and utilize Vision Transformer (ViT) to model the relationship amongst the features and enhance the feature representation ability. We demonstrate that the proposed MSTLoc achieves better performance than current state-of-the-art models in multi-label subcellular location prediction. Through feature visualization and interpretation analysis, we demonstrate that as compared with the hand-crafted features, the multi-scale deep features learnt from our model exhibit better ability in capturing discriminative patterns underlying protein subcellular locations, and the features from different scales are complementary for the improvement in performance. Finally, case study results indicate that our MSTLoc can successfully identify some biomarkers from proteins that are closely involved with cancer development. For the convenient use of our method, we establish a user-friendly webserver available at http://server.wei-group.net/ MSTLoc.
AVAILABILITY AND IMPLEMENTATION
http://server.wei-group.net/ MSTLoc.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Collapse