1
|
Jian M, Jin H, Zhang L, Wei B, Yu H. DBPNDNet: dual-branch networks using 3DCNN toward pulmonary nodule detection. Med Biol Eng Comput 2024; 62:563-573. [PMID: 37945795 DOI: 10.1007/s11517-023-02957-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Accepted: 10/21/2023] [Indexed: 11/12/2023]
Abstract
With the advancement of artificial intelligence, CNNs have been successfully introduced into the discipline of medical data analyzing. Clinically, automatic pulmonary nodules detection remains an intractable issue since those nodules existing in the lung parenchyma or on the chest wall are tough to be visually distinguished from shadows, background noises, blood vessels, and bones. Thus, when making medical diagnosis, clinical doctors need to first pay attention to the intensity cue and contour characteristic of pulmonary nodules, so as to locate the specific spatial locations of nodules. To automate the detection process, we propose an efficient architecture of multi-task and dual-branch 3D convolution neural networks, called DBPNDNet, for automatic pulmonary nodule detection and segmentation. Among the dual-branch structure, one branch is designed for candidate region extraction of pulmonary nodule detection, while the other incorporated branch is exploited for lesion region semantic segmentation of pulmonary nodules. In addition, we develop a 3D attention weighted feature fusion module according to the doctor's diagnosis perspective, so that the captured information obtained by the designed segmentation branch can further promote the effect of the adopted detection branch mutually. The experiment has been implemented and assessed on the commonly used dataset for medical image analysis to evaluate our designed framework. On average, our framework achieved a sensitivity of 91.33% false positives per CT scan and reached 97.14% sensitivity with 8 FPs per scan. The results of the experiments indicate that our framework outperforms other mainstream approaches.
Collapse
Affiliation(s)
- Muwei Jian
- School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan, China.
- School of Information Science and Technology, Linyi University, Linyi, China.
| | - Haodong Jin
- School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan, China
- School of Control Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Linsong Zhang
- School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan, China
| | - Benzheng Wei
- Medical Artificial Intelligence Research Center, Shandong University of Traditional Chinese Medicine, Qingdao, China
| | - Hui Yu
- School of Control Engineering, University of Shanghai for Science and Technology, Shanghai, China
- School of Creative Technologies, University of Portsmouth, Portsmouth, UK
| |
Collapse
|
2
|
Yin Y, Han Z, Jian M, Wang GG, Chen L, Wang R. AMSUnet: A neural network using atrous multi-scale convolution for medical image segmentation. Comput Biol Med 2023; 162:107120. [PMID: 37276753 DOI: 10.1016/j.compbiomed.2023.107120] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Revised: 05/07/2023] [Accepted: 05/30/2023] [Indexed: 06/07/2023]
Abstract
In recent years, Unet and its variants have gained astounding success in the realm of medical image processing. However, some Unet variant networks enhance their performance while increasing the number of parameters tremendously. For lightweight and performance enhancement jointly considerations, inspired by SegNeXt, we develop a medical image segmentation network model using atrous multi-scale (AMS) convolution, named AMSUnet. In particular, we construct a convolutional attention block AMS using atrous and multi-scale convolution, and redesign the downsampling encoder based on this block, called AMSE. To enhance feature fusion, we design a residual attention mechanism module (i.e., RSC) and apply it to the skip connection. Compared with existing models, our model only needs 2.62 M parameters to achieve the purpose of lightweight. According to experimental results on various datasets, the segmentation performance of the designed model is superior for small, medium, and large-scale targets. Code will be available at https://github.com/llluochen/AMSUnet.
Collapse
Affiliation(s)
- Yunchou Yin
- School of Computer Science and Technology, Ocean University of China, Qingdao, China
| | - Zhimeng Han
- School of Computer Science and Technology, Ocean University of China, Qingdao, China
| | - Muwei Jian
- School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan, China; School of Information Science and Technology, Linyi University, Linyi, China.
| | - Gai-Ge Wang
- School of Computer Science and Technology, Ocean University of China, Qingdao, China.
| | - Liyan Chen
- Institute of Big Data and Information Technology, Wenzhou University, Wenzhou, China
| | - Rui Wang
- College of Systems Engineering, National University of Defense Technology, Changsha, China; Xiangjiang Laboratory, Changsha, China
| |
Collapse
|
3
|
STI-Net: Spatiotemporal Integration Network for Video Saliency Detection. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.01.106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
|
4
|
Jian M, Jin H, Liu X, Zhang L. Multiscale Cascaded Attention Network for Saliency Detection Based on ResNet. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22249950. [PMID: 36560319 PMCID: PMC9783234 DOI: 10.3390/s22249950] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Revised: 12/06/2022] [Accepted: 12/14/2022] [Indexed: 06/12/2023]
Abstract
Saliency detection is a key research topic in the field of computer vision. Humans can be accurately and quickly mesmerized by an area of interest in complex and changing scenes through the visual perception area of the brain. Although existing saliency-detection methods can achieve competent performance, they have deficiencies such as unclear margins of salient objects and the interference of background information on the saliency map. In this study, to improve the defects during saliency detection, a multiscale cascaded attention network was designed based on ResNet34. Different from the typical U-shaped encoding-decoding architecture, we devised a contextual feature extraction module to enhance the advanced semantic feature extraction. Specifically, a multiscale cascade block (MCB) and a lightweight channel attention (CA) module were added between the encoding and decoding networks for optimization. To address the blur edge issue, which is neglected by many previous approaches, we adopted the edge thinning module to carry out a deeper edge-thinning process on the output layer image. The experimental results illustrate that this method can achieve competitive saliency-detection performance, and the accuracy and recall rate are improved compared with those of other representative methods.
Collapse
Affiliation(s)
- Muwei Jian
- School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan 250014, China
- School of Information Science and Technology, Linyi University, Linyi 276012, China
| | - Haodong Jin
- School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan 250014, China
| | - Xiangyu Liu
- School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan 250014, China
| | - Linsong Zhang
- School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan 250014, China
| |
Collapse
|
5
|
Audio–visual collaborative representation learning for Dynamic Saliency Prediction. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
6
|
Trapani A, Sheiban FJ, Bertone E, Chiosso S, Colombo L, D'Andrea M, De Santis F, Fati F, Fossati V, Gonzalez V, Pedrocchi A. Reproducing a decision-making network in a virtual visual discrimination task. Front Integr Neurosci 2022; 16:930326. [PMID: 36035443 PMCID: PMC9399926 DOI: 10.3389/fnint.2022.930326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 07/11/2022] [Indexed: 11/13/2022] Open
Abstract
We reproduced a decision-making network model using the neural simulator software neural simulation tool (NEST), and we embedded the spiking neural network in a virtual robotic agent performing a simulated behavioral task. The present work builds upon the concept of replicability in neuroscience, preserving most of the computational properties in the initial model although employing a different software tool. The proposed implementation successfully obtains equivalent results from the original study, reproducing the salient features of the neural processes underlying a binary decision. Furthermore, the resulting network is able to control a robot performing an in silico visual discrimination task, the implementation of which is openly available on the EBRAINS infrastructure through the neuro robotics platform (NRP).
Collapse
|
7
|
EHDC: enhanced dilated convolution framework for underwater blurred target recognition. ROBOTICA 2022. [DOI: 10.1017/s0263574722001059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Abstract
The autonomous underwater vehicle (AUV) has a problem with feature loss when recognizing small targets underwater. At present, algorithms usually use multi-scale feature extraction to solve the problem, but this method increases the computational effort of the algorithm. In addition, low underwater light and turbid water result in incomplete information on target features. This paper proposes an enhanced dilated convolution framework (EHDC) for underwater blurred target recognition. Firstly, this paper extracts small target features through hybrid dilated convolution networks, increasing the perceptive field of the algorithm without increasing the computational power of the algorithm. Secondly, the proposed algorithm learns spatial semantic features through an adaptive correlation matrix and compensates for the missing features of the target. Finally, this paper fuses spatial semantic features and visual features for the recognition of small underwater blurred targets. Experiments show that the proposed method improves the recognition accuracy by 1.04% compared to existing methods when recognizing small underwater blurred targets.
Collapse
|
8
|
Zhang S, Ren Y, Wang J, Song B, Li R, Xu Y. GSTCNet: Gated spatio-temporal correlation network for stroke mortality prediction. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:9966-9982. [PMID: 36031978 DOI: 10.3934/mbe.2022465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Stroke continues to be the most common cause of death in China. It has great significance for mortality prediction for stroke patients, especially in terms of analyzing the complex interactions between non-negligible factors. In this paper, we present a gated spatio-temporal correlation network (GSTCNet) to predict the one-year post-stroke mortality. Based on the four categories of risk factors: vascular event, chronic disease, medical usage and surgery, we designed a gated correlation graph convolution kernel to capture spatial features and enhance the spatial correlation between feature categories. Bi-LSTM represents the temporal features of five timestamps. The novel gated correlation attention mechanism is then connected to the Bi-LSTM to realize the comprehensive mining of spatio-temporal correlations. Using the data on 2275 patients obtained from the neurology department of a local hospital, we constructed a series of sequential experiments. The experimental results show that the proposed model achieves competitive results on each evaluation metric, reaching an AUC of 89.17%, a precision of 97.75%, a recall of 95.33% and an F1-score of 95.19%. The interpretability analysis of the feature categories and timestamps also verified the potential application value of the model for stroke.
Collapse
Affiliation(s)
- Shuo Zhang
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
- Cooperative Innovation Center of Internet Healthcare, Zhengzhou University, Zhengzhou 450000, China
| | - Yonghao Ren
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
- Cooperative Innovation Center of Internet Healthcare, Zhengzhou University, Zhengzhou 450000, China
| | - Jing Wang
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450000, China
- Cooperative Innovation Center of Internet Healthcare, Zhengzhou University, Zhengzhou 450000, China
| | - Bo Song
- Department of Neurology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou 450000, China
- NHC Key Laboratory of Prevention and Treatment of Cerebrovascular Diseases, Zhengzhou 450000, China
| | - Runzhi Li
- Cooperative Innovation Center of Internet Healthcare, Zhengzhou University, Zhengzhou 450000, China
| | - Yuming Xu
- Department of Neurology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou 450000, China
- NHC Key Laboratory of Prevention and Treatment of Cerebrovascular Diseases, Zhengzhou 450000, China
| |
Collapse
|
9
|
Efficient and Scalable Object Localization in 3D on Mobile Device. J Imaging 2022; 8:jimaging8070188. [PMID: 35877632 PMCID: PMC9323171 DOI: 10.3390/jimaging8070188] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 06/23/2022] [Accepted: 06/30/2022] [Indexed: 12/10/2022] Open
Abstract
Two-Dimensional (2D) object detection has been an intensely discussed and researched field of computer vision. With numerous advancements made in the field over the years, we still need to identify a robust approach to efficiently conduct classification and localization of objects in our environment by just using our mobile devices. Moreover, 2D object detection limits the overall understanding of the detected object and does not provide any additional information in terms of its size and position in the real world. This work proposes an object localization solution in Three-Dimension (3D) for mobile devices using a novel approach. The proposed method works by combining a 2D object detection Convolutional Neural Network (CNN) model with Augmented Reality (AR) technologies to recognize objects in the environment and determine their real-world coordinates. We leverage the in-built Simultaneous Localization and Mapping (SLAM) capability of Google’s ARCore to detect planes and know the camera information for generating cuboid proposals from an object’s 2D bounding box. The proposed method is fast and efficient for identifying everyday objects in real-world space and, unlike mobile offloading techniques, the method is well designed to work with limited resources of a mobile device.
Collapse
|
10
|
Choqueluque-Roman D, Camara-Chavez G. Weakly Supervised Violence Detection in Surveillance Video. SENSORS 2022; 22:s22124502. [PMID: 35746286 PMCID: PMC9231349 DOI: 10.3390/s22124502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 06/08/2022] [Accepted: 06/08/2022] [Indexed: 02/04/2023]
Abstract
Automatic violence detection in video surveillance is essential for social and personal security. Monitoring the large number of surveillance cameras used in public and private areas is challenging for human operators. The manual nature of this task significantly increases the possibility of ignoring important events due to human limitations when paying attention to multiple targets at a time. Researchers have proposed several methods to detect violent events automatically to overcome this problem. So far, most previous studies have focused only on classifying short clips without performing spatial localization. In this work, we tackle this problem by proposing a weakly supervised method to detect spatially and temporarily violent actions in surveillance videos using only video-level labels. The proposed method follows a Fast-RCNN style architecture, that has been temporally extended. First, we generate spatiotemporal proposals (action tubes) leveraging pre-trained person detectors, motion appearance (dynamic images), and tracking algorithms. Then, given an input video and the action proposals, we extract spatiotemporal features using deep neural networks. Finally, a classifier based on multiple-instance learning is trained to label each action tube as violent or non-violent. We obtain similar results to the state of the art in three public databases Hockey Fight, RLVSD, and RWF-2000, achieving an accuracy of 97.3%, 92.88%, 88.7%, respectively.
Collapse
Affiliation(s)
- David Choqueluque-Roman
- Department of Computer Science, Universidad Católica San Pablo, Arequipa 04001, Peru
- Correspondence:
| | - Guillermo Camara-Chavez
- Department of Computer Science, Federal University of Ouro Preto, Ouro Preto 35400-000, Brazil;
| |
Collapse
|
11
|
Nicora E, Noceti N. On the Use of Efficient Projection Kernels for Motion-Based Visual Saliency Estimation. FRONTIERS IN COMPUTER SCIENCE 2022. [DOI: 10.3389/fcomp.2022.867289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
In this paper, we investigate the potential of a family of efficient filters—the Gray-Code Kernels (GCKs)—for addressing visual saliency estimation with a focus on motion information. Our implementation relies on the use of 3D kernels applied to overlapping blocks of frames and is able to gather meaningful spatio-temporal information with a very light computation. We introduce an attention module that reasons the use of pooling strategies, combined in an unsupervised way to derive a saliency map highlighting the presence of motion in the scene. A coarse segmentation map can also be obtained. In the experimental analysis, we evaluate our method on publicly available datasets and show that it is able to effectively and efficiently identify the portion of the image where the motion is occurring, providing tolerance to a variety of scene conditions and complexities.
Collapse
|