1
|
Dilek E, Dener M. Computer Vision Applications in Intelligent Transportation Systems: A Survey. SENSORS (BASEL, SWITZERLAND) 2023; 23:2938. [PMID: 36991649 PMCID: PMC10051529 DOI: 10.3390/s23062938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 03/03/2023] [Accepted: 03/06/2023] [Indexed: 06/19/2023]
Abstract
As technology continues to develop, computer vision (CV) applications are becoming increasingly widespread in the intelligent transportation systems (ITS) context. These applications are developed to improve the efficiency of transportation systems, increase their level of intelligence, and enhance traffic safety. Advances in CV play an important role in solving problems in the fields of traffic monitoring and control, incident detection and management, road usage pricing, and road condition monitoring, among many others, by providing more effective methods. This survey examines CV applications in the literature, the machine learning and deep learning methods used in ITS applications, the applicability of computer vision applications in ITS contexts, the advantages these technologies offer and the difficulties they present, and future research areas and trends, with the goal of increasing the effectiveness, efficiency, and safety level of ITS. The present review, which brings together research from various sources, aims to show how computer vision techniques can help transportation systems to become smarter by presenting a holistic picture of the literature on different CV applications in the ITS context.
Collapse
|
2
|
Pedestrian gender classification on imbalanced and small sample datasets using deep and traditional features. Neural Comput Appl 2023. [DOI: 10.1007/s00521-023-08331-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2023]
|
3
|
Density-based clustering with fully-convolutional networks for crowd flow detection from drones. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.01.059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
4
|
Determination of Non-Digestible Parts in Dairy Cattle Feces Using U-NET and F-CRN Architectures. Vet Sci 2023; 10:vetsci10010032. [PMID: 36669033 PMCID: PMC9866369 DOI: 10.3390/vetsci10010032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 12/20/2022] [Accepted: 12/29/2022] [Indexed: 01/04/2023] Open
Abstract
Deep learning algorithms can now be used to identify, locate, and count items in an image thanks to advancements in image processing technology. The successful application of image processing technology in different fields has attracted much attention in the field of agriculture in recent years. This research was done to ascertain the number of indigestible cereal grains in animal feces using an image processing method. In this study, a regression-based way of object counting was used to predict the number of cereal grains in the feces. For this purpose, we have developed two different neural network architectures based upon Fully Convolutional Regression Networks (FCRN) and U-Net. The images used in the study were obtained from three different dairy cows enterprises operating in Nigde Province. The dataset consists of the 277 distinct dropping images of dairy cows in the farm. According to findings of the study, both models yielded quite acceptable prediction accuracy with U-Net providing slightly better prediction with a MAE value of 16.69 in the best case, compared to 23.65 MAE value of FCRN with the same batch.
Collapse
|
5
|
Lian D, Chen X, Li J, Luo W, Gao S. Locating and Counting Heads in Crowds With a Depth Prior. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:9056-9072. [PMID: 34735337 DOI: 10.1109/tpami.2021.3124956] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
To simultaneously estimate the number of heads and locate heads with bounding boxes, we resort to detection-based crowd counting by leveraging RGB-D data and design a dual-path guided detection network (DPDNet). Specifically, to improve the performance of detection-based approaches for dense/tiny heads, we propose a density map guided detection module, which leverages density map to improve the head/non-head classification in detection network where the density implies the probability of a pixel being a head, and a depth-adaptive kernel that considers the variances in head sizes is also introduced to generate high-fidelity density map for more robust density map regression. In order to prevent dense heads from being filtered out during post-processing, we utilize such a density map for post-processing of head detection and propose a density map guided NMS strategy. Meanwhile, to improve the ability of detecting small heads, we also propose a depth-guided detection module to generate a dynamic dilated convolution to extract features of heads of different scales, and a depth-aware anchor is further designed for better initialization of anchor sizes in the detection framework. Then we use the bounding boxes whose sizes are generated with depth to train our DPDNet. Considering that existing RGB-D datasets are too small and not suitable for performance evaluation of data-driven based approaches, we collect two large-scale RGB-D crowd counting datasets, which comprise a synthetic dataset and a real-world dataset, respectively. Since the depth value at long-distance positions cannot be obtained in the real-world dataset, we further propose a depth completion method with meta learning, which fully utilizes the synthetic depth data to complete the depth value at long-distance positions. Extensive experiments on our proposed two RGB-D datasets and the MICC RGB-D counting dataset show that our method achieves the best performance for RGB-D crowd counting and localization. Further, our method can be easily extended to RGB image based crowd counting and achieves comparable or even better performance on the RGB datasets for both head counting and localization.
Collapse
|
6
|
Zheng Z, Ni N, Xie G, Zhu A, Wu Y, Yang T. HARNet: Hierarchical adaptive regression with location recovery for crowd counting. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
7
|
Multi-object detection at night for traffic investigations based on improved SSD framework. Heliyon 2022; 8:e11570. [PMID: 36439720 PMCID: PMC9691875 DOI: 10.1016/j.heliyon.2022.e11570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 07/14/2022] [Accepted: 11/07/2022] [Indexed: 11/16/2022] Open
Abstract
Despite significant progress in vision-based detection methods, the task of detecting traffic objects at night remains challenging. Visual information of medium and small stationary objects is deteriorated due to poor lighting conditions. And the visual information is important for traffic investigations. For meeting the needs of night traffic investigations, this study focuses on presenting a nighttime multi-object detection framework based on Single Shot MultiBox Detector (SSD). Considering the need of traffic investigations, the applicable detection framework is presented for detecting traffic objects, especially medium and small stationary objects. In the framework, the Dense Convolutional Network (DenseNet) and deconvolutional layers are introduced to enhance the feature reuse, and the effectiveness of the optimization is finally verified. In this paper, qualitative and quantitative experiments are presented. The results show that our presented framework has better detection performance for medium and small stationary objects. Moreover, the results show that presented framework has better performance for nighttime traffic investigations at intersections.
Collapse
|
8
|
Owen D, Grammatikopoulou M, Luengo I, Stoyanov D. Automated identification of critical structures in laparoscopic cholecystectomy. Int J Comput Assist Radiol Surg 2022; 17:2173-2181. [PMID: 36272018 DOI: 10.1007/s11548-022-02771-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 09/30/2022] [Indexed: 11/05/2022]
Abstract
PURPOSE Bile duct injury is a significant problem in laparoscopic cholecystectomy and can have grave consequences for patient outcomes. Automatic identification of the critical structures (cystic duct and cystic artery) could potentially reduce complications during surgery by helping the surgeon establish Critical View of Safety, or eventually may even provide real time intra-operative guidance. METHODS A computer vision model was trained to identify the critical structures. Label relaxation enabled the model to cope with ambiguous spatial extent and high annotation variability. Pseudo-label self-supervision allowed the model to use unlabelled data, which can be particularly beneficial when scarce labelled data is available for training. Intrinsic variability in annotations was assessed across several annotators, quantifying the extent of annotation ambiguity and setting a baseline for model accuracy. RESULTS Using 3050 labelled and 3682 unlabelled cholecystectomy frames, the model achieved an IoU of 65% and presence detection F1 score of 75%. Inter-annotator IoU agreement was 70%, demonstrating the model was near human-level agreement on average in this dataset. The model's outputs were validated by three expert surgeons, who confirmed that its outputs were accurate and promising for future usage. CONCLUSION Identification of critical structures can achieve high accuracy, and is a promising step towards computer-assisted intervention in addition to potential applications in analytics and education. High accuracy and surgeon approval is maintained when detecting the structures separately as distinct classes. Future work will focus on guaranteeing safe identification of critical anatomy, including the bile duct, and validating the performance of automated approaches.
Collapse
Affiliation(s)
| | | | | | - Danail Stoyanov
- Digital Surgery, Medtronic, London, UK.,Wellcome / EPSRC Centre for Interventional and Surgical Sciences, University College London, London, UK
| |
Collapse
|
9
|
Bai H, Mao J, Gary Chan SH. A survey on deep learning-based single image crowd counting: Network design, loss function and supervisory signal. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.08.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
10
|
Guo X, Gao M, Zhai W, Shang J, Li Q. Spatial-Frequency Attention Network for Crowd Counting. BIG DATA 2022; 10:453-465. [PMID: 35679590 DOI: 10.1089/big.2022.0039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Counting the number of people in crowded scenarios is a crucial task in video surveillance and urban security system. Widely deployed surveillance cameras provide big data for training, a compelling deep learning-based counting network. However, large-scale variations in dense crowds are still not entirely solved. To address this problem, we propose a spatial-frequency attention network (SFANet) for crowd counting in this article. A bottleneck spatial attention module is built to emphasize features in various spatial locations and select a region containing individuals adaptively in the spatial domain. As a complementary, in the frequency domain, a multispectral channel attention module is adopted to obtain a more complete set of frequency components for representing each channel. The two attention modules are combined to focus on the discriminative region and suppress the misleading information by their mutual promotion. Experimental results on five benchmark crowd data sets demonstrate that the SFANet can achieve the state-of-the-art performance in terms of accuracy and robustness.
Collapse
Affiliation(s)
- Xiangyu Guo
- School of Electrical and Electronic Engineering, Shandong University of Technology, Zibo, China
| | - Mingliang Gao
- School of Electrical and Electronic Engineering, Shandong University of Technology, Zibo, China
| | - Wenzhe Zhai
- School of Electrical and Electronic Engineering, Shandong University of Technology, Zibo, China
| | - Jianrun Shang
- School of Electrical and Electronic Engineering, Shandong University of Technology, Zibo, China
| | - Qilei Li
- School of Electronic Engineering and Computer Science, Queen Mary University of London, London, United Kingdom
| |
Collapse
|
11
|
Gupta S, Kumar P, Tekchandani RK. Facial emotion recognition based real-time learner engagement detection system in online learning context using deep learning models. MULTIMEDIA TOOLS AND APPLICATIONS 2022; 82:11365-11394. [PMID: 36105662 PMCID: PMC9461440 DOI: 10.1007/s11042-022-13558-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Revised: 05/14/2022] [Accepted: 07/14/2022] [Indexed: 06/15/2023]
Abstract
The dramatic impact of the COVID-19 pandemic has resulted in the closure of physical classrooms and teaching methods being shifted to the online medium.To make the online learning environment more interactive, just like traditional offline classrooms, it is essential to ensure the proper engagement of students during online learning sessions.This paper proposes a deep learning-based approach using facial emotions to detect the real-time engagement of online learners. This is done by analysing the students' facial expressions to classify their emotions throughout the online learning session. The facial emotion recognition information is used to calculate the engagement index (EI) to predict two engagement states "Engaged" and "Disengaged". Different deep learning models such as Inception-V3, VGG19 and ResNet-50 are evaluated and compared to get the best predictive classification model for real-time engagement detection. Varied benchmarked datasets such as FER-2013, CK+ and RAF-DB are used to gauge the overall performance and accuracy of the proposed system. Experimental results showed that the proposed system achieves an accuracy of 89.11%, 90.14% and 92.32% for Inception-V3, VGG19 and ResNet-50, respectively, on benchmarked datasets and our own created dataset. ResNet-50 outperforms all others with an accuracy of 92.3% for facial emotions classification in real-time learning scenarios.
Collapse
Affiliation(s)
- Swadha Gupta
- Department of Computer Science and Engineering, Thapar Institute of Engineering and Technology, Patiala, India
| | - Parteek Kumar
- Department of Computer Science and Engineering, Thapar Institute of Engineering and Technology, Patiala, India
| | - Raj Kumar Tekchandani
- Department of Computer Science and Engineering, Thapar Institute of Engineering and Technology, Patiala, India
| |
Collapse
|
12
|
Luo Y, Lu J, Jiang X, Zhang B. Learning From Architectural Redundancy: Enhanced Deep Supervision in Deep Multipath Encoder-Decoder Networks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:4271-4284. [PMID: 33587717 DOI: 10.1109/tnnls.2021.3056384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Deep encoder-decoders are the model of choice for pixel-level estimation due to their redundant deep architectures. Yet they still suffer from the vanishing supervision information issue that affects convergence because of their overly deep architectures. In this work, we propose and theoretically derive an enhanced deep supervision (EDS) method which improves on conventional deep supervision (DS) by incorporating variance minimization into the optimization. A new structure variance loss is introduced to build a bridge between deep encoder-decoders and variance minimization, and provides a new way to minimize the variance by forcing different intermediate decoding outputs (paths) to reach an agreement. We also design a focal weighting strategy to effectively combine multiple losses in a scale-balanced way, so that the supervision information is sufficiently enforced throughout the encoder-decoders. To evaluate the proposed method on the pixel-level estimation task, a novel multipath residual encoder is proposed and extensive experiments are conducted on four challenging density estimation and crowd counting benchmarks. The experimental results demonstrate the superiority of our EDS over other paradigms, and improved estimation performance is reported using our deeply supervised encoder-decoder.
Collapse
|
13
|
Learning the cross-modal discriminative feature representation for RGB-T crowd counting. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
14
|
Meng C, Kang C, Lyu L. Hierarchical feature aggregation network with semantic attention for counting large‐scale crowd. INT J INTELL SYST 2022. [DOI: 10.1002/int.23023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Chen Meng
- School of Information Science and Engineering Shandong Normal University Jinan Shandong China
| | - Chunmeng Kang
- School of Information Science and Engineering Shandong Normal University Jinan Shandong China
- Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology Jinan Shandong China
| | - Lei Lyu
- School of Information Science and Engineering Shandong Normal University Jinan Shandong China
- Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology Jinan Shandong China
| |
Collapse
|
15
|
Ren G, Lu X, Wang J, Li Y. Enhancement of Local Crowd Location and Count: Multiscale Counting Guided by Head RGB-Mask. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:5708807. [PMID: 36059394 PMCID: PMC9433205 DOI: 10.1155/2022/5708807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 07/20/2022] [Accepted: 08/01/2022] [Indexed: 11/17/2022]
Abstract
Background In crowded crowd images, traditional detection models often have the problems of inaccurate multiscale target count and low recall rate. Methods In order to solve the above two problems, this paper proposes an MLP-CNN model, which combined with FPN feature pyramid can fuse the feature map of low-resolution and high-resolution semantic information with less computation and can effectively solve the problem of inaccurate head count of multiscale people. MLP-CNN "mid-term" fusion model can effectively fuse the features of RGB head image and RGB-Mask image. With the help of head RGB-Mask annotation and adaptive Gaussian kernel regression, the enhanced density map can be generated, which can effectively solve the problem of low recall of head detection. Results MLP-CNN model was applied in ShanghaiTech and UCF_ CC_ 50 and UCF-QNRF. The test results show that the error of the method proposed in this paper has been significantly improved, and the recall rate can reach 79.91%. Conclusion MLP-CNN model not only improves the accuracy of population counting in density map regression, but also improves the detection rate of multiscale population head targets.
Collapse
Affiliation(s)
- Guoyin Ren
- School of Mechanical Engineering, Inner Mongolia University of Science & Technology, Baotou 014010, China
- School of Information Engineering, Inner Mongolia University of Science & Technology, Baotou 014010, China
| | - Xiaoqi Lu
- School of Mechanical Engineering, Inner Mongolia University of Science & Technology, Baotou 014010, China
- Inner Mongolia University of Technology, Hohhot 010051, China
| | - Jingyu Wang
- School of Information Engineering, Inner Mongolia University of Science & Technology, Baotou 014010, China
| | - Yuhao Li
- School of Information Engineering, Inner Mongolia University of Science & Technology, Baotou 014010, China
| |
Collapse
|
16
|
Zhong X, Qin J, Guo M, Zuo W, Lu W. Offset-decoupled deformable convolution for efficient crowd counting. Sci Rep 2022; 12:12229. [PMID: 35851829 PMCID: PMC9293988 DOI: 10.1038/s41598-022-16415-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Accepted: 07/11/2022] [Indexed: 11/09/2022] Open
Abstract
Crowd counting is considered a challenging issue in computer vision. One of the most critical challenges in crowd counting is considering the impact of scale variations. Compared with other methods, better performance is achieved with CNN-based methods. However, given the limit of fixed geometric structures, the head-scale features are not completely obtained. Deformable convolution with additional offsets is widely used in the fields of image classification and pattern recognition, as it can successfully exploit the potential of spatial information. However, owing to the randomly generated parameters of offsets in network initialization, the sampling points of the deformable convolution are disorderly stacked, weakening the effectiveness of feature extraction. To handle the invalid learning of offsets and the inefficient utilization of deformable convolution, an offset-decoupled deformable convolution (ODConv) is proposed in this paper. It can completely obtain information within the effective region of sampling points, leading to better performance. In extensive experiments, average MAE of 62.3, 8.3, 91.9, and 159.3 are achieved using our method on the ShanghaiTech A, ShanghaiTech B, UCF-QNRF, and UCF_CC_50 datasets, respectively, outperforming the state-of-the-art methods and validating the effectiveness of the proposed ODConv.
Collapse
Affiliation(s)
- Xin Zhong
- Department of Educational Technology, Ocean University of China, Qingdao, 266100, China
| | - Jing Qin
- Department of Educational Technology, Ocean University of China, Qingdao, 266100, China
| | - Mingyue Guo
- Department of Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Wangmeng Zuo
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
| | - Weigang Lu
- Department of Educational Technology, Ocean University of China, Qingdao, 266100, China.
| |
Collapse
|
17
|
Improved YOLOv4 for Pedestrian Detection and Counting in UAV Images. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:6106853. [PMID: 35875752 PMCID: PMC9303083 DOI: 10.1155/2022/6106853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 05/09/2022] [Accepted: 06/06/2022] [Indexed: 11/17/2022]
Abstract
UAV (unmanned aerial vehicle) captured images have small pedestrian targets and loss of key information after multiple down sampling, which are difficult to overcome by existing methods. We propose an improved YOLOv4 model for pedestrian detection and counting in UAV images, named YOLO-CC. We used the lightweight YOLOv4 for pedestrian detection, which replaces the backbone with CSPDarknet-34, and two feature layers are fused by FPN (Feature Pyramid Networks). We expanded the perception field using multiscale convolution based on the high-level feature map and generated the population density map by feature dimension reduction. By embedding the density map generation method into the network for end-to-end training, our model can effectively improve the accuracy of detection and counting and make feature extraction more focused on small targets. Our experiments demonstrate that YOLO-CC achieves 21.76 points AP50 higher than that of the original YOLOv4 on the VisDrone2021-counting data set while running faster than the original YOLOv4.
Collapse
|
18
|
Single-layer vision transformers for more accurate early exits with less overhead. Neural Netw 2022; 153:461-473. [DOI: 10.1016/j.neunet.2022.06.038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 06/27/2022] [Accepted: 06/28/2022] [Indexed: 11/19/2022]
|
19
|
Deep Transfer Learning Enabled Intelligent Object Detection for Crowd Density Analysis on Video Surveillance Systems. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12136665] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Object detection is a computer vision based technique which is used to detect instances of semantic objects of a particular class in digital images and videos. Crowd density analysis is one of the commonly utilized applications of object detection. Since crowd density classification techniques face challenges like non-uniform density, occlusion, inter-scene, and intra-scene deviations, convolutional neural network (CNN) models are useful. This paper presents a Metaheuristics with Deep Transfer Learning Enabled Intelligent Crowd Density Detection and Classification (MDTL-ICDDC) model for video surveillance systems. The proposed MDTL-ICDDC technique mostly concentrates on the effective identification and classification of crowd density on video surveillance systems. In order to achieve this, the MDTL-ICDDC model primarily leverages a Salp Swarm Algorithm (SSA) with NASNetLarge model as a feature extraction in which the hyperparameter tuning process is performed by the SSA. Furthermore, a weighted extreme learning machine (WELM) method was utilized for crowd density and classification process. Finally, the krill swarm algorithm (KSA) is applied for an effective parameter optimization process and thereby improves the classification results. The experimental validation of the MDTL-ICDDC approach was carried out with a benchmark dataset, and the outcomes are examined under several aspects. The experimental values indicated that the MDTL-ICDDC system has accomplished enhanced performance over other models such as Gabor, BoW-SRP, Bow-LBP, GLCM-SVM, GoogleNet, and VGGNet.
Collapse
|
20
|
Wide-Area Crowd Counting: Multi-view Fusion Networks for Counting in Large Scenes. Int J Comput Vis 2022. [DOI: 10.1007/s11263-022-01626-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
21
|
Quigley A, Nguyen PY, Stone H, Heslop DJ, Chughtai AA, MacIntyre CR. Estimated Mask Use and Temporal Relationship to COVID-19 Epidemiology of Black Lives Matter Protests in 12 Cities. J Racial Ethn Health Disparities 2022; 10:1212-1223. [PMID: 35543865 PMCID: PMC9092928 DOI: 10.1007/s40615-022-01308-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 04/05/2022] [Accepted: 04/16/2022] [Indexed: 11/26/2022]
Abstract
There is an increased risk of SARS-CoV-2 transmission during mass gatherings and a risk of asymptomatic infection. We aimed to estimate the use of masks during Black Lives Matter (BLM) protests and whether these protests increased the risk of COVID-19. Two reviewers screened 496 protest images for mask use, with high inter-rater reliability. Protest intensity, use of tear gas, government control measures, and testing rates were estimated in 12 cities. A correlation analysis was conducted to assess the potential effect of mask use and other measures, adjusting for testing rates, on COVID-19 epidemiology 4 weeks (two incubation periods) post-protests. Mask use ranged from 69 to 96% across protests. There was no increase in the incidence of COVID-19 post-protest in 11 cities. After adjusting for testing rates, only Miami, which involved use of tear gas and had high protest intensity, showed a clear increase in COVID-19 after one incubation period post-protest. No significant correlation was found between incidence and protest factors. Our study showed that protests in most cities studied did not increase COVID-19 incidence in 2020, and a high level of mask use was seen. The absence of an epidemic surge within two incubation periods of a protest is indicative that the protests did not have a major influence on epidemic activity, except in Miami. With the globally circulating highly transmissible Alpha, Delta, and Omicron variants, layered interventions such as mandated mask use, physical distancing, testing, and vaccination should be applied for mass gatherings in the future.
Collapse
Affiliation(s)
- Ashley Quigley
- Biosecurity Research Program, The Kirby Institute, UNSW, Wallace Wurth Building, UNSW, High St, Kensington Campus, Sydney, NSW, 2052, Australia.
| | - Phi Yen Nguyen
- School of Population Health, UNSW, Level 3, Samuels Building, UNSW, Sydney, NSW, 2052, Australia
| | - Haley Stone
- Biosecurity Research Program, The Kirby Institute, UNSW, Wallace Wurth Building, UNSW, High St, Kensington Campus, Sydney, NSW, 2052, Australia
| | - David J Heslop
- School of Population Health, UNSW, Level 3, Samuels Building, UNSW, Sydney, NSW, 2052, Australia
| | - Abrar Ahmad Chughtai
- School of Population Health, UNSW, Level 3, Samuels Building, UNSW, Sydney, NSW, 2052, Australia
| | - C Raina MacIntyre
- Biosecurity Research Program, The Kirby Institute, UNSW, Wallace Wurth Building, UNSW, High St, Kensington Campus, Sydney, NSW, 2052, Australia
| |
Collapse
|
22
|
Zhang M. Educational Psychology Analysis Method for Extracting Students' Facial Information Based on Image Big Data. Occup Ther Int 2022; 2022:8709591. [PMID: 35645653 PMCID: PMC9117017 DOI: 10.1155/2022/8709591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 04/26/2022] [Accepted: 04/28/2022] [Indexed: 11/17/2022] Open
Abstract
At present, most of the research on academic emotions focuses on the concept, current situation, and relevance. There are not many researches on the application of artificial intelligence-based neural network facial expression recognition technology in practical teaching. With reference to image-based big data, this research integrates the application of artificial intelligence facial expression recognition technology with the research on educational theory and applies information technology to the actual teaching process, in order to promote the optimization of the teaching process and improve the learning effect. Method. A Hadoop cluster consisting of 3 nodes is built on the Linux system, and the environment required for Opencv execution is compiled for each node, which provides support for subsequent parallel optimization, feature extraction, feature fusion, and recognition of student facial images. The image data type and input and output format based on MapReduce framework are designed, and the image data is optimized by means of serialized files. The color features, texture features, and Sift features of students' facial images and common distractors were analyzed. A parallel extraction framework of student facial image features is designed, and based on this, the student facial image feature extraction under Hadoop platform is implemented. This paper proposes a dynamic sequential facial expression recognition method that combines shallow and deep features with an attention mechanism. The relative position of facial landmarks and local area texture features based on FACS represent shallow-level features. At the same time, the structure of ALexNet is improved to extract the deep features of sequence images to express high-level semantic features. The effectiveness of the facial expression recognition system is improved by introducing three attention mechanisms: self-attention, weight-attention, and convolutional attention. Results/Discussion. Through the analysis of the teaching effect, we found that when teachers can obtain the correct student's academic mood, they can intervene on the students' positive academic mood. The purpose of the intervention is to improve the positive academic emotions of students. After the students receive the intervention, their academic emotions are also improved and are positively correlated with their academic performance. Through the analysis of teaching effect, the research can achieve the predetermined goal. From the specific teaching effect of this study, it is concluded that in classroom teaching, teachers should devote energy to intervene in students' positive academic emotions, in order to improve students' positive academic emotions, which will improve students' academic performance and teaching.
Collapse
Affiliation(s)
- Maoyue Zhang
- School of Law, Tianjin Normal University, Tianjin 300387, China
| |
Collapse
|
23
|
On-Board Crowd Counting and Density Estimation Using Low Altitude Unmanned Aerial Vehicles—Looking beyond Beating the Benchmark. REMOTE SENSING 2022. [DOI: 10.3390/rs14102288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Recent advances in deep learning-based image processing have enabled significant improvements in multiple computer vision fields, with crowd counting being no exception. Crowd counting is still attracting research interest due to its potential usefulness for traffic and pedestrian stream monitoring and analysis. This study considered a specific case of crowd counting, namely, counting based on low-altitude aerial images collected by an unmanned aerial vehicle. We evaluated a range of neural network architectures to find ones appropriate for on-board image processing using edge computing devices while minimising the loss in performance. Through experiments on a range of neural network architectures, we also showed that the input image resolution significantly impacts the prediction quality and should be considered an important factor before going for a more complex neural network model to improve accuracy. Moreover, by extending a state-of-the-art benchmark with more in-depth testing, we showed that larger models might be prone to overfitting because of the relative scarcity of training data.
Collapse
|
24
|
Sindagi VA, Yasarla R, Patel VM. JHU-CROWD++: Large-Scale Crowd Counting Dataset and A Benchmark Method. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:2594-2609. [PMID: 33147141 DOI: 10.1109/tpami.2020.3035969] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
We introduce a new large scale unconstrained crowd counting dataset (JHU-CROWD++) that contains "4,372" images with "1.51 million" annotations. In comparison to existing datasets, the proposed dataset is collected under a variety of diverse scenarios and environmental conditions. Specifically, the dataset includes several images with weather-based degradations and illumination variations, making it a very challenging dataset. Additionally, the dataset consists of a rich set of annotations at both image-level and head-level. Several recent methods are evaluated and compared on this dataset. The dataset can be downloaded from http://www.crowd-counting.com. Furthermore, we propose a novel crowd counting network that progressively generates crowd density maps via residual error estimation. The proposed method uses VGG16 as the backbone network and employs density map generated by the final layer as a coarse prediction to refine and generate finer density maps in a progressive fashion using residual learning. Additionally, the residual learning is guided by an uncertainty-based confidence weighting mechanism that permits the flow of only high-confidence residuals in the refinement path. The proposed Confidence Guided Deep Residual Counting Network (CG-DRCN) is evaluated on recent complex datasets, and it achieves significant improvements In errors.
Collapse
|
25
|
Toha TR, Al-Nabhan NA, Salim SI, Rahaman M, Kamal U, Islam AAA. LC-Net: Localized Counting Network for extremely dense crowds. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.108930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
26
|
Bhuiyan MR, Abdullah J, Hashim N, Al Farid F, Ahsanul Haque M, Uddin J, Mohd Isa WN, Husen MN, Abdullah N. A deep crowd density classification model for Hajj pilgrimage using fully convolutional neural network. PeerJ Comput Sci 2022; 8:e895. [PMID: 35494812 PMCID: PMC9044363 DOI: 10.7717/peerj-cs.895] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Accepted: 01/26/2022] [Indexed: 06/14/2023]
Abstract
This research enhances crowd analysis by focusing on excessive crowd analysis and crowd density predictions for Hajj and Umrah pilgrimages. Crowd analysis usually analyzes the number of objects within an image or a frame in the videos and is regularly solved by estimating the density generated from the object location annotations. However, it suffers from low accuracy when the crowd is far away from the surveillance camera. This research proposes an approach to overcome the problem of estimating crowd density taken by a surveillance camera at a distance. The proposed approach employs a fully convolutional neural network (FCNN)-based method to monitor crowd analysis, especially for the classification of crowd density. This study aims to address the current technological challenges faced in video analysis in a scenario where the movement of large numbers of pilgrims with densities ranging between 7 and 8 per square meter. To address this challenge, this study aims to develop a new dataset based on the Hajj pilgrimage scenario. To validate the proposed method, the proposed model is compared with existing models using existing datasets. The proposed FCNN based method achieved a final accuracy of 100%, 98%, and 98.16% on the proposed dataset, the UCSD dataset, and the JHU-CROWD dataset, respectively. Additionally, The ResNet based method obtained final accuracy of 97%, 89%, and 97% for the proposed dataset, UCSD dataset, and JHU-CROWD dataset, respectively. The proposed Hajj-Crowd-2021 crowd analysis dataset and the model outperformed the other state-of-the-art datasets and models in most cases.
Collapse
Affiliation(s)
- Md Roman Bhuiyan
- Faculty of Computing and Informatics, Multimedia University, Cyberjaya, Selengor, Malaysia
| | - Junaidi Abdullah
- Faculty of Computing and Informatics, Multimedia University, Cyberjaya, Selengor, Malaysia
| | - Noramiza Hashim
- Faculty of Computing and Informatics, Multimedia University, Cyberjaya, Selengor, Malaysia
| | - Fahmid Al Farid
- Faculty of Computing and Informatics, Multimedia University, Cyberjaya, Selengor, Malaysia
| | - Mohammad Ahsanul Haque
- Data Scientist and Machine Learning Developer, Aalborg University, Aalborg, Aalborg, Denmark
| | - Jia Uddin
- Technology Studies Department, Woosong University, Daejeon, South Korea
| | | | - Mohd Nizam Husen
- Information Technology, Malaysian Institute of Information Technology Universiti Kuala Lumpur, Kuala Lumpur, Malaysia
| | - Norra Abdullah
- Computer Science, WSA Venture Australia (M) Sdn Bhd, Cyberjaya, Malaysia
| |
Collapse
|
27
|
On the Reliability of CNNs in Clinical Practice: A Computer-Aided Diagnosis System Case Study. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12073269] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Leukocytes classification is essential to assess their number and status since they are the body’s first defence against infection and disease. Automation of the process can reduce the laborious manual process of review and diagnosis by operators and has been the subject of study for at least two decades. Most computer-aided systems exploit convolutional neural networks for classification purposes without any intermediate step to produce an accurate classification. This work explores the current limitations of deep learning-based methods applied to medical blood smear data. In particular, we consider leukocyte analysis oriented towards leukaemia prediction as a case study. In particular, we aim to demonstrate that a single classification step can undoubtedly lead to incorrect predictions or, worse, to correct predictions obtained with wrong indicators provided by the images. By generating new synthetic leukocyte data, it is possible to demonstrate that the inclusion of a fine-grained method, such as detection or segmentation, before classification is essential to allow the network to understand the adequate information on individual white blood cells correctly. The effectiveness of this study is thoroughly analysed and quantified through a series of experiments on a public data set of blood smears taken under a microscope. Experimental results show that residual networks perform statistically better in this scenario, even though they make correct predictions with incorrect information.
Collapse
|
28
|
Li P, Zhang M, Wan J, Jiang M. DMPNet: densely connected multi-scale pyramid networks for crowd counting. PeerJ Comput Sci 2022; 8:e902. [PMID: 35494810 PMCID: PMC9044264 DOI: 10.7717/peerj-cs.902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Accepted: 02/07/2022] [Indexed: 06/14/2023]
Abstract
Crowd counting has been widely studied by deep learning in recent years. However, due to scale variation caused by perspective distortion, crowd counting is still a challenging task. In this paper, we propose a Densely Connected Multi-scale Pyramid Network (DMPNet) for count estimation and the generation of high-quality density maps. The key component of our network is the Multi-scale Pyramid Network (MPN), which can extract multi-scale features of the crowd effectively while keeping the resolution of the input feature map and the number of channels unchanged. To increase the information transfer between the network layer, we used dense connections to connect multiple MPNs. In addition, we also designed a novel loss function, which can help our model achieve better convergence. To evaluate our method, we conducted extensive experiments on three challenging benchmark crowd counting datasets. Experimental results show that compared with the state-of-the-art algorithms, DMPNet performs well in both parameters and results. The code is available at: https://github.com/lpfworld/DMPNet.
Collapse
Affiliation(s)
- Pengfei Li
- Computer & Software School, Hangzhou Dianzi University, Hangzhou, Zhejiang, China
| | - Min Zhang
- Computer & Software School, Hangzhou Dianzi University, Hangzhou, Zhejiang, China
| | - Jian Wan
- Computer & Software School, Hangzhou Dianzi University, Hangzhou, Zhejiang, China
| | - Ming Jiang
- Computer & Software School, Hangzhou Dianzi University, Hangzhou, Zhejiang, China
| |
Collapse
|
29
|
Shi Y, Sang J, Wu Z, Wang F, Liu X, Xia X, Sang N. MGSNet: A multi-scale and gated spatial attention network for crowd counting. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03263-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
30
|
A Traffic Event Detection Method Based on Random Forest and Permutation Importance. MATHEMATICS 2022. [DOI: 10.3390/math10060873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Although the video surveillance system plays an important role in intelligent transportation, the limited camera views make it difficult to observe many traffic events. In this paper, we collect and combine the traffic flow variables from the multi-source sensors, and propose a PITED method based on Random Forest (RF) and Permutation importance (PI) for traffic event detection. This model selects the suitable traffic flow variables by means of permutation arrangement of importance, and establishes the whole process of acquisition, preprocessing, quantization, modeling and evaluation. Moreover, the real traffic data are collected and tested in this paper for evaluating the experiment performance, including the miss/false rate of traffic event, and average detection time. The experimental results show that the detection rate is more than 85% and the false alarm rate is less than 3%. It means the model is effective and efficient in the practical application regardless of both workdays and holidays.
Collapse
|
31
|
Wan J, Wang Q, Chan AB. Kernel-Based Density Map Generation for Dense Object Counting. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:1357-1370. [PMID: 32903177 DOI: 10.1109/tpami.2020.3022878] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Crowd counting is an essential topic in computer vision due to its practical usage in surveillance systems. The typical design of crowd counting algorithms is divided into two steps. First, the ground-truth density maps of crowd images are generated from the ground-truth dot maps (density map generation), e.g., by convolving with a Gaussian kernel. Second, deep learning models are designed to predict a density map from an input image (density map estimation). The density map based counting methods that incorporate density map as the intermediate representation have improved counting performance dramatically. However, in the sense of end-to-end training, the hand-crafted methods used for generating the density maps may not be optimal for the particular network or dataset used. To address this issue, we propose an adaptive density map generator, which takes the annotation dot map as input, and learns a density map representation for a counter. The counter and generator are trained jointly within an end-to-end framework. We also show that the proposed framework can be applied to general dense object counting tasks. Extensive experiments are conducted on 10 datasets for 3 applications: crowd counting, vehicle counting, and general object counting. The experiment results on these datasets confirm the effectiveness of the proposed learnable density map representations.
Collapse
|
32
|
Fan Z, Zhang H, Zhang Z, Lu G, Zhang Y, Wang Y. A survey of crowd counting and density estimation based on convolutional neural network. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.02.103] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
33
|
Liu Q, Guo Y, Sang J, Tan J, Wang F, Tian S. SGCNet: Scale-aware and global contextual network for crowd counting. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03230-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
34
|
Unlocking the Potential of Deep Learning for Migratory Waterbirds Monitoring Using Surveillance Video. REMOTE SENSING 2022. [DOI: 10.3390/rs14030514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Estimates of migratory waterbirds population provide the essential scientific basis to guide the conservation of coastal wetlands, which are heavily modified and threatened by economic development. New equipment and technology have been increasingly introduced in protected areas to expand the monitoring efforts, among which video surveillance and other unmanned devices are widely used in coastal wetlands. However, the massive amount of video records brings the dual challenge of storage and analysis. Manual analysis methods are time-consuming and error-prone, representing a significant bottleneck to rapid data processing and dissemination and application of results. Recently, video processing with deep learning has emerged as a solution, but its ability to accurately identify and count waterbirds across habitat types (e.g., mudflat, saltmarsh, and open water) is untested in coastal environments. In this study, we developed a two-step automatic waterbird monitoring framework. The first step involves automatic video segmentation, selection, processing, and mosaicking video footages into panorama images covering the entire monitoring area, which are subjected to the second step of counting and density estimation using a depth density estimation network (DDE). We tested the effectiveness and performance of the framework in Tiaozini, Jiangsu Province, China, which is a restored wetland, providing key high-tide roosting ground for migratory waterbirds in the East Asian–Australasian flyway. The results showed that our approach achieved an accuracy of 85.59%, outperforming many other popular deep learning algorithms. Furthermore, the standard error of our model was very small (se = 0.0004), suggesting the high stability of the method. The framework is computing effective—it takes about one minute to process a theme covering the entire site using a high-performance desktop computer. These results demonstrate that our framework can extract ecologically meaningful data and information from video surveillance footages accurately to assist biodiversity monitoring, fulfilling the gap in the efficient use of existing monitoring equipment deployed in protected areas.
Collapse
|
35
|
|
36
|
|
37
|
Synthetic Generation of Passive Infrared Motion Sensor Data Using a Game Engine. SENSORS 2021; 21:s21238078. [PMID: 34884081 PMCID: PMC8662402 DOI: 10.3390/s21238078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 11/19/2021] [Accepted: 11/28/2021] [Indexed: 11/22/2022]
Abstract
Quantifying the number of occupants in an indoor space is useful for a wide variety of applications. Attempts have been made at solving the task using passive infrared (PIR) motion sensor data together with supervised learning methods. Collecting a large labeled dataset containing both PIR motion sensor data and ground truth people count is however time-consuming, often requiring one hour of observation for each hour of data gathered. In this paper, a method is proposed for generating such data synthetically. A simulator is developed in the Unity game engine capable of producing synthetic PIR motion sensor data by detecting simulated occupants. The accuracy of the simulator is tested by replicating a real-world meeting room inside the simulator and conducting an experiment where a set of choreographed movements are performed in the simulated environment as well as the real room. In 34 out of 50 tested situations, the output from the simulated PIR sensors is comparable to the output from the real-world PIR sensors. The developed simulator is also used to study how a PIR sensor’s output changes depending on where in a room a motion is carried out. Through this, the relationship between sensor output and spatial position of a motion is discovered to be highly non-linear, which highlights some of the difficulties associated with mapping PIR data to occupancy count.
Collapse
|
38
|
Hoekendijk JPA, Kellenberger B, Aarts G, Brasseur S, Poiesz SSH, Tuia D. Counting using deep learning regression gives value to ecological surveys. Sci Rep 2021; 11:23209. [PMID: 34853327 PMCID: PMC8636638 DOI: 10.1038/s41598-021-02387-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Accepted: 11/10/2021] [Indexed: 12/03/2022] Open
Abstract
Many ecological studies rely on count data and involve manual counting of objects of interest, which is time-consuming and especially disadvantageous when time in the field or lab is limited. However, an increasing number of works uses digital imagery, which opens opportunities to automatise counting tasks. In this study, we use machine learning to automate counting objects of interest without the need to label individual objects. By leveraging already existing image-level annotations, this approach can also give value to historical data that were collected and annotated over longer time series (typical for many ecological studies), without the aim of deep learning applications. We demonstrate deep learning regression on two fundamentally different counting tasks: (i) daily growth rings from microscopic images of fish otolith (i.e., hearing stone) and (ii) hauled out seals from highly variable aerial imagery. In the otolith images, our deep learning-based regressor yields an RMSE of 3.40 day-rings and an [Formula: see text] of 0.92. Initial performance in the seal images is lower (RMSE of 23.46 seals and [Formula: see text] of 0.72), which can be attributed to a lack of images with a high number of seals in the initial training set, compared to the test set. We then show how to improve performance substantially (RMSE of 19.03 seals and [Formula: see text] of 0.77) by carefully selecting and relabelling just 100 additional training images based on initial model prediction discrepancy. The regression-based approach used here returns accurate counts ([Formula: see text] of 0.92 and 0.77 for the rings and seals, respectively), directly usable in ecological research.
Collapse
Affiliation(s)
- Jeroen P A Hoekendijk
- NIOZ Royal Netherlands Institute for Sea Research, 1790AB, Den Burg, The Netherlands.
- Wageningen University and Research, 6708PB, Wageningen, The Netherlands.
| | | | - Geert Aarts
- NIOZ Royal Netherlands Institute for Sea Research, 1790AB, Den Burg, The Netherlands
- Wageningen Marine Research, Wageningen University and Research, 1781AG, Den Helder, The Netherlands
- Wageningen University and Research, Wildlife Ecology and Conservation Group, 6708 PB, Wageningen, The Netherlands
| | - Sophie Brasseur
- Wageningen Marine Research, Wageningen University and Research, 1781AG, Den Helder, The Netherlands
| | - Suzanne S H Poiesz
- NIOZ Royal Netherlands Institute for Sea Research, 1790AB, Den Burg, The Netherlands
- Groningen Institute of Evolutionary Life Sciences, University of Groningen, 9700 CC, Groningen, The Netherlands
| | - Devis Tuia
- Ecole Polytechnique Fédérale de Lausanne (EPFL), 1950, Sion, Switzerland
| |
Collapse
|
39
|
Li P, Zhang M, Wan J, Jiang M. Multiscale Aggregate Networks with Dense Connections for Crowd Counting. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2021; 2021:9996232. [PMID: 34804153 PMCID: PMC8601827 DOI: 10.1155/2021/9996232] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Revised: 10/16/2021] [Accepted: 10/28/2021] [Indexed: 11/18/2022]
Abstract
The most advanced method for crowd counting uses a fully convolutional network that extracts image features and then generates a crowd density map. However, this process often encounters multiscale and contextual loss problems. To address these problems, we propose a multiscale aggregation network (MANet) that includes a feature extraction encoder (FEE) and a density map decoder (DMD). The FEE uses a cascaded scale pyramid network to extract multiscale features and obtains contextual features through dense connections. The DMD uses deconvolution and fusion operations to generate features containing detailed information. These features can be further converted into high-quality density maps to accurately calculate the number of people in a crowd. An empirical comparison using four mainstream datasets (ShanghaiTech, WorldExpo'10, UCF_CC_50, and SmartCity) shows that the proposed method is more effective in terms of the mean absolute error and mean squared error. The source code is available at https://github.com/lpfworld/MANet.
Collapse
Affiliation(s)
- Pengfei Li
- Hangzhou Dianzi University, Baiyang Road No. 2, Hangzhou, China
| | - Min Zhang
- Hangzhou Dianzi University, Baiyang Road No. 2, Hangzhou, China
| | - Jian Wan
- Hangzhou Dianzi University, Baiyang Road No. 2, Hangzhou, China
| | - Ming Jiang
- Hangzhou Dianzi University, Baiyang Road No. 2, Hangzhou, China
| |
Collapse
|
40
|
Zhou Y, Yang J, Li H, Cao T, Kung SY. Adversarial Learning for Multiscale Crowd Counting Under Complex Scenes. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:5423-5432. [PMID: 31905157 DOI: 10.1109/tcyb.2019.2956091] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this article, a multiscale generative adversarial network (MS-GAN) is proposed for generating high-quality crowd density maps of arbitrary crowd density scenes. The task of crowd counting has many challenges, such as severe occlusions in extremely dense crowd scenes, perspective distortion, and high visual similarity between the pedestrians and background elements. To address these problems, the proposed MS-GAN combines a multiscale convolutional neural network (generator) and an adversarial network (discriminator) to generate a high-quality density map and accurately estimate the crowd count in complex crowd scenes. The multiscale generator utilizes the fusion features from multiple hierarchical layers to detect people with large-scale variation. The resulting density map produced by the multiscale generator is processed by a discriminator network trained to solve a binary classification task between a poor quality density map and real ground-truth ones. The additional adversarial loss can improve the quality of the density map, which is critical to accurately estimate the crowd counts. The experiments were conducted on multiple datasets with different crowd scenes and densities. The results showed that the proposed method provided better performance compared to current state-of-the-art methods.
Collapse
|
41
|
Dutta HS, Jobanputra M, Negi H, Chakraborty T. Detecting and Analyzing Collusive Entities on YouTube. ACM T INTEL SYST TEC 2021. [DOI: 10.1145/3477300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
YouTube sells advertisements on the posted videos, which in turn enables the content creators to monetize their videos. As an unintended consequence, this has proliferated various illegal activities such as artificial boosting of views, likes, comments, and subscriptions. We refer to such
videos
(gaining likes and comments artificially) and
channels
(gaining subscriptions artificially) as “collusive entities.” Detecting such collusive entities is an important yet challenging task. Existing solutions mostly deal with the problem of spotting fake views, spam comments, fake content, and so on, and oftentimes ignore how such fake activities emerge via collusion. Here, we collect a large dataset consisting of two types of collusive entities on YouTube—
videos
submitted to gain collusive likes and comment requests and
channels
submitted to gain collusive subscriptions.
We begin by providing an in-depth analysis of collusive entities on YouTube fostered by various
blackmarket services
. Following this, we propose models to detect three types of collusive YouTube entities: videos seeking collusive likes, channels seeking collusive subscriptions, and videos seeking collusive comments. The third type of entity is associated with temporal information. To detect videos and channels for collusive likes and subscriptions, respectively, we utilize one-class classifiers trained on our curated collusive entities and a set of novel features. The SVM-based model shows significant performance with a true positive rate of 0.911 and 0.910 for detecting collusive videos and collusive channels, respectively. To detect videos seeking collusive comments, we propose
CollATe
, a novel end-to-end neural architecture that leverages time-series information of posted comments along with static metadata of videos.
CollATe
is composed of three components: metadata feature extractor (which derives metadata-based features from videos), anomaly feature extractor (which utilizes the time-series data to detect sudden changes in the commenting activity), and comment feature extractor (which utilizes the text of the comments posted during collusion and computes a similarity score between the comments). Extensive experiments show the effectiveness of
CollATe
(with a true positive rate of 0.905) over the baselines.
Collapse
|
42
|
An Image-Based Steel Rebar Size Estimation and Counting Method Using a Convolutional Neural Network Combined with Homography. BUILDINGS 2021. [DOI: 10.3390/buildings11100463] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Conventionally, the number of steel rebars at construction sites is manually counted by workers. However, this practice gives rise to several problems: it is slow, human-resource-intensive, time-consuming, error-prone, and not very accurate. Consequently, a new method of quickly and accurately counting steel rebars with a minimal number of workers needs to be developed to enhance work efficiency and reduce labor costs at construction sites. In this study, the authors developed an automated system to estimate the size and count the number of steel rebars in bale packing using computer vision techniques based on a convolutional neural network (CNN). A dataset containing 622 images of rebars with a total of 186,522 rebar cross sections and 409 poly tags was established for segmentation rebars and poly tags in images. The images were collected in a full HD resolution of 1920 × 1080 pixels and then center-cropped to 512 × 512 pixels. Moreover, data augmentation was carried out to create 4668 images for the training dataset. Based on the training dataset, YOLACT-based steel bar size estimation and a counting model with a Box and Mask of over 30 mAP was generated to satisfy the aim of this study. The proposed method, which is a CNN model combined with homography, can estimate the size and count the number of steel rebars in an image quickly and accurately, and the developed method can be applied to real construction sites to efficiently manage the stock of steel rebars.
Collapse
|
43
|
Gender Classification Using Proposed CNN-Based Model and Ant Colony Optimization. MATHEMATICS 2021. [DOI: 10.3390/math9192499] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Pedestrian gender classification is one of the key assignments of pedestrian study, and it finds practical applications in content-based image retrieval, population statistics, human–computer interaction, health care, multimedia retrieval systems, demographic collection, and visual surveillance. In this research work, gender classification was carried out using a deep learning approach. A new 64-layer architecture named 4-BSMAB derived from deep AlexNet is proposed. The proposed model was trained on CIFAR-100 dataset utilizing SoftMax classifier. Then, features were obtained from applied datasets with this pre-trained model. The obtained feature set was optimized with ant colony system (ACS) optimization technique. Various classifiers of SVM and KNN were used to perform gender classification utilizing the optimized feature set. Comprehensive experimentation was performed on gender classification datasets, and proposed model produced better results than the existing methods. The suggested model attained highest accuracy, i.e., 85.4%, and 92% AUC on MIT dataset, and best classification results, i.e., 93% accuracy and 96% AUC, on PKU-Reid dataset. The outcomes of extensive experiments carried out on existing standard pedestrian datasets demonstrate that the proposed framework outperformed existing pedestrian gender classification methods, and acceptable results prove the proposed model as a robust model.
Collapse
|
44
|
Liu X, Sang J, Wu W, Liu K, Liu Q, Xia X. Density-aware and background-aware network for crowd counting via multi-task learning. Pattern Recognit Lett 2021. [DOI: 10.1016/j.patrec.2021.07.013] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
45
|
Litrico M, Battiato S, Tsaftaris SA, Giuffrida MV. Semi-Supervised Domain Adaptation for Holistic Counting under Label Gap. J Imaging 2021; 7:jimaging7100198. [PMID: 34677284 PMCID: PMC8541592 DOI: 10.3390/jimaging7100198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 09/20/2021] [Accepted: 09/21/2021] [Indexed: 11/16/2022] Open
Abstract
This paper proposes a novel approach for semi-supervised domain adaptation for holistic regression tasks, where a DNN predicts a continuous value y∈R given an input image x. The current literature generally lacks specific domain adaptation approaches for this task, as most of them mostly focus on classification. In the context of holistic regression, most of the real-world datasets not only exhibit a covariate (or domain) shift, but also a label gap-the target dataset may contain labels not included in the source dataset (and vice versa). We propose an approach tackling both covariate and label gap in a unified training framework. Specifically, a Generative Adversarial Network (GAN) is used to reduce covariate shift, and label gap is mitigated via label normalisation. To avoid overfitting, we propose a stopping criterion that simultaneously takes advantage of the Maximum Mean Discrepancy and the GAN Global Optimality condition. To restore the original label range-that was previously normalised-a handful of annotated images from the target domain are used. Our experimental results, run on 3 different datasets, demonstrate that our approach drastically outperforms the state-of-the-art across the board. Specifically, for the cell counting problem, the mean squared error (MSE) is reduced from 759 to 5.62; in the case of the pedestrian dataset, our approach lowered the MSE from 131 to 1.47. For the last experimental setup, we borrowed a task from plant biology, i.e., counting the number of leaves in a plant, and we ran two series of experiments, showing the MSE is reduced from 2.36 to 0.88 (intra-species), and from 1.48 to 0.6 (inter-species).
Collapse
Affiliation(s)
- Mattia Litrico
- Department of Mathematics and Computer Science, University of Catania, 95125 Catania, Italy; (M.L.); (S.B.)
| | - Sebastiano Battiato
- Department of Mathematics and Computer Science, University of Catania, 95125 Catania, Italy; (M.L.); (S.B.)
| | | | - Mario Valerio Giuffrida
- School of Computing, Edinburgh Napier University, Edinburgh EH10 5DT, UK
- Correspondence: ; Tel.: +44-131-455-2744
| |
Collapse
|
46
|
Advances in Convolution Neural Networks Based Crowd Counting and Density Estimation. BIG DATA AND COGNITIVE COMPUTING 2021. [DOI: 10.3390/bdcc5040050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Automatically estimating the number of people in unconstrained scenes is a crucial yet challenging task in different real-world applications, including video surveillance, public safety, urban planning, and traffic monitoring. In addition, methods developed to estimate the number of people can be adapted and applied to related tasks in various fields, such as plant counting, vehicle counting, and cell microscopy. Many challenges and problems face crowd counting, including cluttered scenes, extreme occlusions, scale variation, and changes in camera perspective. Therefore, in the past few years, tremendous research efforts have been devoted to crowd counting, and numerous excellent techniques have been proposed. The significant progress in crowd counting methods in recent years is mostly attributed to advances in deep convolution neural networks (CNNs) as well as to public crowd counting datasets. In this work, we review the papers that have been published in the last decade and provide a comprehensive survey of the recent CNNs based crowd counting techniques. We briefly review detection-based, regression-based, and traditional density estimation based approaches. Then, we delve into detail regarding the deep learning based density estimation approaches and recently published datasets. In addition, we discuss the potential applications of crowd counting and in particular its applications using unmanned aerial vehicle (UAV) images.
Collapse
|
47
|
Zhang B, Wang N, Zhao Z, Abraham A, Liu H. Crowd counting based on attention-guided multi-scale fusion networks. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.04.045] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
48
|
Filipic J, Biagini M, Mas I, Pose CD, Giribet JI, Parisi DR. People counting using visible and infrared images. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.03.089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
49
|
A Wide Area Multiview Static Crowd Estimation System Using UAV and 3D Training Simulator. REMOTE SENSING 2021. [DOI: 10.3390/rs13142780] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Crowd size estimation is a challenging problem, especially when the crowd is spread over a significant geographical area. It has applications in monitoring of rallies and demonstrations and in calculating the assistance requirements in humanitarian disasters. Therefore, accomplishing a crowd surveillance system for large crowds constitutes a significant issue. UAV-based techniques are an appealing choice for crowd estimation over a large region, but they present a variety of interesting challenges, such as integrating per-frame estimates through a video without counting individuals twice. Large quantities of annotated training data are required to design, train, and test such a system. In this paper, we have first reviewed several crowd estimation techniques, existing crowd simulators and data sets available for crowd analysis. Later, we have described a simulation system to provide such data, avoiding the need for tedious and error-prone manual annotation. Then, we have evaluated synthetic video from the simulator using various existing single-frame crowd estimation techniques. Our findings show that the simulated data can be used to train and test crowd estimation, thereby providing a suitable platform to develop such techniques. We also propose an automated UAV-based 3D crowd estimation system that can be used for approximately static or slow-moving crowds, such as public events, political rallies, and natural or man-made disasters. We evaluate the results by applying our new framework to a variety of scenarios with varying crowd sizes. The proposed system gives promising results using widely accepted metrics including MAE, RMSE, Precision, Recall, and F1 score to validate the results.
Collapse
|
50
|
Ban T, Usui T, Yamamoto T. Spatial Autoregressive Model for Estimation of Visitors' Dynamic Agglomeration Patterns Near Event Location. SENSORS 2021; 21:s21134577. [PMID: 34283103 PMCID: PMC8271624 DOI: 10.3390/s21134577] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 06/20/2021] [Accepted: 06/29/2021] [Indexed: 11/20/2022]
Abstract
The rapid development of ubiquitous mobile computing has enabled the collection of new types of massive traffic data to understand collective movement patterns in social spaces. Contributing to the understanding of crowd formation and dispersal in populated areas, we developed a model of visitors’ dynamic agglomeration patterns at a particular event using dynamic population data. This information, a type of big data, comprised aggregate Global Positioning System (GPS) location data automatically collected from mobile phones without users’ intervention over a grid with a spatial resolution of 250 m. Herein, spatial autoregressive models with two-step adjacency matrices are proposed to represent visitors’ movement between grids around the event site. We confirmed that the proposed models had a higher goodness-of-fit than those without spatial or temporal autocorrelations. The results also show a significant reduction in accuracy when applied to prediction with estimated values of the endogenous variables of prior time periods.
Collapse
Affiliation(s)
- Takumi Ban
- Department of Civil Engineering, Graduate School of Engineering, Nagoya University, Nagoya 464-8603, Japan
- Correspondence: (T.B.); (T.Y.)
| | - Tomotaka Usui
- Faculty of Human Environments, University of Human Environments, Okazaki 444-3505, Japan;
| | - Toshiyuki Yamamoto
- Institute of Materials and Systems for Sustainability, Nagoya University, Nagoya 464-8603, Japan
- Correspondence: (T.B.); (T.Y.)
| |
Collapse
|