1
|
CA-STD: Scene Text Detection in Arbitrary Shape Based on Conditional Attention. INFORMATION 2022. [DOI: 10.3390/info13120565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Scene Text Detection (STD) is critical for obtaining textual information from natural scenes, serving for automated driving and security surveillance. However, existing text detection methods fall short when dealing with the variation in text curvatures, orientations, and aspect ratios in complex backgrounds. To meet the challenge, we propose a method called CA-STD to detect arbitrarily shaped text against a complicated background. Firstly, a Feature Refinement Module (FRM) is proposed to enhance feature representation. Additionally, the conditional attention mechanism is proposed not only to decouple the spatial and textual information from scene text images, but also to model the relationship among different feature vectors. Finally, the Contour Information Aggregation (CIA) is presented to enrich the feature representation of text contours by considering circular topology and semantic information simultaneously to obtain the detection curves with arbitrary shapes. The proposed CA-STD method is evaluated on different datasets with extensive experiments. On the one hand, the CA-STD outperforms state-of-the-art methods and achieves 82.9 in precision on the dataset of TotalText. On the other hand, the method has better performance than state-of-the-art methods and achieves the F1 score of 83.8 on the dataset of CTW-1500. The quantitative and qualitative analysis proves that the CA-STD can detect variably shaped scene text effectively.
Collapse
|
2
|
STR Transformer: A Cross-domain Transformer for Scene Text Recognition. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03728-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
3
|
MKD: Mixup-Based Knowledge Distillation for Mandarin End-to-End Speech Recognition. ALGORITHMS 2022. [DOI: 10.3390/a15050160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Large-scale automatic speech recognition model has achieved impressive performance. However, huge computational resources and massive amount of data are required to train an ASR model. Knowledge distillation is a prevalent model compression method which transfers the knowledge from large model to small model. To improve the efficiency of knowledge distillation for end-to-end speech recognition especially in the low-resource setting, a Mixup-based Knowledge Distillation (MKD) method is proposed which combines Mixup, a data-agnostic data augmentation method, with softmax-level knowledge distillation. A loss-level mixture is presented to address the problem caused by the non-linearity of label in the KL-divergence when adopting Mixup to the teacher–student framework. It is mathematically shown that optimizing the mixture of loss function is equivalent to optimize an upper bound of the original knowledge distillation loss. The proposed MKD takes the advantage of Mixup and brings robustness to the model even with a small amount of training data. The experiments on Aishell-1 show that MKD obtains a 15.6% and 3.3% relative improvement on two student models with different parameter scales compared with the existing methods. Experiments on data efficiency demonstrate MKD achieves similar results with only half of the original dataset.
Collapse
|
4
|
Face aging with pixel-level alignment GAN. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03541-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
5
|
Detection of human lower limb mechanical axis key points and its application on patella misalignment detection. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02718-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
6
|
van Harten LD, de Jonge CS, Beek KJ, Stoker J, Išgum I. Untangling and segmenting the small intestine in 3D cine-MRI using deep learning. Med Image Anal 2022; 78:102386. [DOI: 10.1016/j.media.2022.102386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 12/23/2021] [Accepted: 02/01/2022] [Indexed: 10/19/2022]
|
7
|
Arkko A, Kaseva T, Salli E, Mäkelä T, Savolainen S, Kangasniemi M. Automatic detection of Crohn's disease using quantified motility in magnetic resonance enterography: initial experiences. Clin Radiol 2021; 77:96-103. [PMID: 34753588 DOI: 10.1016/j.crad.2021.10.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Accepted: 10/06/2021] [Indexed: 12/13/2022]
Abstract
AIM To report initial experiences of automatic detection of Crohn's disease (CD) using quantified motility in magnetic resonance enterography (MRE). MATERIALS AND METHODS From 302 patients, three datasets with roughly equal proportions of CD and non-CD cases with various illnesses were drawn for testing and neural network training and validation. All datasets had unique MRE parameter configurations and were performed in free breathing. Nine neural networks were devised for automatic generation of three different regions of interests (ROI): small bowel, all bowel, and non-bowel. Additionally, a full-image ROI was tested. The motility in an MRE series was quantified via a registration procedure, which, accompanied with given ROIs, resulted in three motility indices (MI). A subset of the indices was used as an input for a binary logistic regression classifier, which predicted whether the MRE series represented CD. RESULTS The highest mean area under the curve (AUC) score, 0.78, was reached using the full-image ROI and with the dataset with the highest cine series length. The best AUC scores for the other two datasets were only 0.54 and 0.49. CONCLUSION The automatic system was able to detect CD in the group of MRE studies with lower temporal resolution and longer cine series showing potential in primary bowel disorder diagnostics. Larger ROI selections and utilising all available cine series for motility registration yielded slight performance improvements.
Collapse
Affiliation(s)
- A Arkko
- HUS Medical Imaging Center, Radiology, Helsinki University Hospital and University of Helsinki, P.O. Box 340, FI-00290, Helsinki, Finland.
| | - T Kaseva
- HUS Medical Imaging Center, Radiology, Helsinki University Hospital and University of Helsinki, P.O. Box 340, FI-00290, Helsinki, Finland
| | - E Salli
- HUS Medical Imaging Center, Radiology, Helsinki University Hospital and University of Helsinki, P.O. Box 340, FI-00290, Helsinki, Finland
| | - T Mäkelä
- HUS Medical Imaging Center, Radiology, Helsinki University Hospital and University of Helsinki, P.O. Box 340, FI-00290, Helsinki, Finland; Department of Physics, University of Helsinki, P.O. Box 64, FI-00014, Helsinki, Finland
| | - S Savolainen
- HUS Medical Imaging Center, Radiology, Helsinki University Hospital and University of Helsinki, P.O. Box 340, FI-00290, Helsinki, Finland; Department of Physics, University of Helsinki, P.O. Box 64, FI-00014, Helsinki, Finland
| | - M Kangasniemi
- HUS Medical Imaging Center, Radiology, Helsinki University Hospital and University of Helsinki, P.O. Box 340, FI-00290, Helsinki, Finland
| |
Collapse
|
8
|
Improved direction-of-arrival estimation method based on LSTM neural networks with robustness to array imperfections. APPL INTELL 2021. [DOI: 10.1007/s10489-020-02124-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
9
|
Wang Y, Gong G, Kong D, Li Q, Dai J, Zhang H, Qu J, Liu X, Xue J. Pancreas segmentation using a dual-input v-mesh network. Med Image Anal 2021; 69:101958. [PMID: 33550009 DOI: 10.1016/j.media.2021.101958] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 12/03/2020] [Accepted: 12/31/2020] [Indexed: 11/18/2022]
Abstract
Accurate segmentation of the pancreas from abdomen scans is crucial for the diagnosis and treatment of pancreatic diseases. However, the pancreas is a small, soft and elastic abdominal organ with high anatomical variability and has a low tissue contrast in computed tomography (CT) scans, which makes segmentation tasks challenging. To address this challenge, we propose a dual-input v-mesh fully convolutional network (FCN) to segment the pancreas in abdominal CT images. Specifically, dual inputs, i.e., original CT scans and images processed by a contrast-specific graph-based visual saliency (GBVS) algorithm, are simultaneously sent to the network to improve the contrast of the pancreas and other soft tissues. To further enhance the ability to learn context information and extract distinct features, a v-mesh FCN with an attention mechanism is initially utilized. In addition, we propose a spatial transformation and fusion (SF) module to better capture the geometric information of the pancreas and facilitate feature map fusion. We compare the performance of our method with several baseline and state-of-the-art methods on the publicly available NIH dataset. The comparison results show that our proposed dual-input v-mesh FCN model outperforms previous methods in terms of the Dice similarity coefficient (DSC), positive predictive value (PPV), sensitivity (SEN), average surface distance (ASD) and Hausdorff distance (HD). Moreover, ablation studies show that our proposed modules/structures are critical for effective pancreas segmentation.
Collapse
Affiliation(s)
- Yuan Wang
- Business School, Academy of Management Science, Shandong Normal University, Jinan, Shandong 250014, China
| | - Guanzhong Gong
- Shandong Cancer Hospital and Institute, Shandong First Medical University and Shandong Academy of Medical Sciences, China
| | - Deting Kong
- Business School, Academy of Management Science, Shandong Normal University, Jinan, Shandong 250014, China
| | - Qi Li
- Business School, Academy of Management Science, Shandong Normal University, Jinan, Shandong 250014, China
| | - Jinpeng Dai
- Business School, Academy of Management Science, Shandong Normal University, Jinan, Shandong 250014, China
| | - Hongyan Zhang
- Business School, Academy of Management Science, Shandong Normal University, Jinan, Shandong 250014, China
| | - Jianhua Qu
- Business School, Academy of Management Science, Shandong Normal University, Jinan, Shandong 250014, China
| | - Xiyu Liu
- Business School, Academy of Management Science, Shandong Normal University, Jinan, Shandong 250014, China
| | - Jie Xue
- Business School, Academy of Management Science, Shandong Normal University, Jinan, Shandong 250014, China.
| |
Collapse
|
10
|
COVID-AL: The diagnosis of COVID-19 with deep active learning. Med Image Anal 2020; 68:101913. [PMID: 33285482 PMCID: PMC7689310 DOI: 10.1016/j.media.2020.101913] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Revised: 08/02/2020] [Accepted: 11/12/2020] [Indexed: 12/15/2022]
Abstract
The COVID-AL framework simultaneously considers the sample diversity and the predicted loss to improve the efficiency of active learning methods. Weakly supervised active learning is performed with patient-level labels in the proposed COVID-AL framework. A 2D U-Net and a 3D residual network are tailor-designed for the lung region segmentation and the diagnosis of COVID-19.
The efficient diagnosis of COVID-19 plays a key role in preventing the spread of this disease. The computer-aided diagnosis with deep learning methods can perform automatic detection of COVID-19 using CT scans. However, large scale annotation of CT scans is impossible because of limited time and heavy burden on the healthcare system. To meet the challenge, we propose a weakly-supervised deep active learning framework called COVID-AL to diagnose COVID-19 with CT scans and patient-level labels. The COVID-AL consists of the lung region segmentation with a 2D U-Net and the diagnosis of COVID-19 with a novel hybrid active learning strategy, which simultaneously considers sample diversity and predicted loss. With a tailor-designed 3D residual network, the proposed COVID-AL can diagnose COVID-19 efficiently and it is validated on a large CT scan dataset collected from the CC-CCII. The experimental results demonstrate that the proposed COVID-AL outperforms the state-of-the-art active learning approaches in the diagnosis of COVID-19. With only 30% of the labeled data, the COVID-AL achieves over 95% accuracy of the deep learning method using the whole dataset. The qualitative and quantitative analysis proves the effectiveness and efficiency of the proposed COVID-AL framework.
Collapse
|
11
|
|
12
|
Zhou Z, Feng Z, Hu C, Hu G, He W, Han X. Aeronautical relay health state assessment model based on belief rule base with attribute reliability. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.105869] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|