1
|
Natha S, Ahmed F, Siraj M, Lagari M, Altamimi M, Chandio AA. Deep BiLSTM Attention Model for Spatial and Temporal Anomaly Detection in Video Surveillance. SENSORS (BASEL, SWITZERLAND) 2025; 25:251. [PMID: 39797042 PMCID: PMC11723474 DOI: 10.3390/s25010251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2024] [Revised: 12/21/2024] [Accepted: 01/02/2025] [Indexed: 01/13/2025]
Abstract
Detection of anomalies in video surveillance plays a key role in ensuring the safety and security of public spaces. The number of surveillance cameras is growing, making it harder to monitor them manually. So, automated systems are needed. This change increases the demand for automated systems that detect abnormal events or anomalies, such as road accidents, fighting, snatching, car fires, and explosions in real-time. These systems improve detection accuracy, minimize human error, and make security operations more efficient. In this study, we proposed the Composite Recurrent Bi-Attention (CRBA) model for detecting anomalies in surveillance videos. The CRBA model combines DenseNet201 for robust spatial feature extraction with BiLSTM networks that capture temporal dependencies across video frames. A multi-attention mechanism was also incorporated to direct the model's focus to critical spatiotemporal regions. This improves the system's ability to distinguish between normal and abnormal behaviors. By integrating these methodologies, the CRBA model improves the detection and classification of anomalies in surveillance videos, effectively addressing both spatial and temporal challenges. Experimental assessments demonstrate that the CRBA model achieves high accuracy on both the University of Central Florida (UCF) and the newly developed Road Anomaly Dataset (RAD). This model enhances detection accuracy while also improving resource efficiency and minimizing response times in critical situations. These advantages make it an invaluable tool for public safety and security operations, where rapid and accurate responses are needed for maintaining safety.
Collapse
Affiliation(s)
- Sarfaraz Natha
- Department of Information Technology, Quaid e Awam University, Nawabshah 67450, Pakistan; (F.A.); (M.L.); (A.A.C.)
- Department of Software Engineering, Sir Syed University of Engineering & Technology, Karachi 75300, Pakistan
| | - Fareed Ahmed
- Department of Information Technology, Quaid e Awam University, Nawabshah 67450, Pakistan; (F.A.); (M.L.); (A.A.C.)
| | - Mohammad Siraj
- Department of Electrical Engineering, College of Engineering, King Saud University, Riyadh 11543, Saudi Arabia;
| | - Mehwish Lagari
- Department of Information Technology, Quaid e Awam University, Nawabshah 67450, Pakistan; (F.A.); (M.L.); (A.A.C.)
| | - Majid Altamimi
- Department of Electrical Engineering, College of Engineering, King Saud University, Riyadh 11543, Saudi Arabia;
| | - Asghar Ali Chandio
- Department of Information Technology, Quaid e Awam University, Nawabshah 67450, Pakistan; (F.A.); (M.L.); (A.A.C.)
| |
Collapse
|
2
|
Li Z, Song X, Chen S, Demachi K. Armed boundary sabotage: A case study of human malicious behaviors identification with computer vision and explainable reasoning methods. COMPUTERS AND ELECTRICAL ENGINEERING 2025; 121:109924. [DOI: 10.1016/j.compeleceng.2024.109924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2025]
|
3
|
Rehman SU, Yasin AU, Ul Haq E, Ali M, Kim J, Mehmood A. Enhancing Human Activity Recognition through Integrated Multimodal Analysis: A Focus on RGB Imaging, Skeletal Tracking, and Pose Estimation. SENSORS (BASEL, SWITZERLAND) 2024; 24:4646. [PMID: 39066043 PMCID: PMC11280841 DOI: 10.3390/s24144646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 07/14/2024] [Accepted: 07/16/2024] [Indexed: 07/28/2024]
Abstract
Human activity recognition (HAR) is pivotal in advancing applications ranging from healthcare monitoring to interactive gaming. Traditional HAR systems, primarily relying on single data sources, face limitations in capturing the full spectrum of human activities. This study introduces a comprehensive approach to HAR by integrating two critical modalities: RGB imaging and advanced pose estimation features. Our methodology leverages the strengths of each modality to overcome the drawbacks of unimodal systems, providing a richer and more accurate representation of activities. We propose a two-stream network that processes skeletal and RGB data in parallel, enhanced by pose estimation techniques for refined feature extraction. The integration of these modalities is facilitated through advanced fusion algorithms, significantly improving recognition accuracy. Extensive experiments conducted on the UTD multimodal human action dataset (UTD MHAD) demonstrate that the proposed approach exceeds the performance of existing state-of-the-art algorithms, yielding improved outcomes. This study not only sets a new benchmark for HAR systems but also highlights the importance of feature engineering in capturing the complexity of human movements and the integration of optimal features. Our findings pave the way for more sophisticated, reliable, and applicable HAR systems in real-world scenarios.
Collapse
Affiliation(s)
- Sajid Ur Rehman
- Department of Creative Technologies, Air University, Islamabad 44000, Pakistan
| | - Aman Ullah Yasin
- Department of Creative Technologies, Air University, Islamabad 44000, Pakistan
| | - Ehtisham Ul Haq
- Department of Creative Technologies, Air University, Islamabad 44000, Pakistan
| | - Moazzam Ali
- Department of Creative Technologies, Air University, Islamabad 44000, Pakistan
| | - Jungsuk Kim
- Department of Biomedical Engineering, College of IT Convergence, Gachon University, 1342 Seongnamdaero, Sujeong-gu, Seongnam-si 13120, Republic of Korea;
- Research and Development Laboratory, Cellico Company, Seongnam-si 13449, Republic of Korea
| | - Asif Mehmood
- Department of Biomedical Engineering, College of IT Convergence, Gachon University, 1342 Seongnamdaero, Sujeong-gu, Seongnam-si 13120, Republic of Korea;
| |
Collapse
|
4
|
Wajid MS, Terashima-Marin H, Paul Rad PN, Wajid MA. Violence Detection Approach based on Cloud Data and Neutrosophic Cognitive Maps. JOURNAL OF CLOUD COMPUTING 2022. [DOI: 10.1186/s13677-022-00369-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
AbstractViolence has remained a momentous problem since time immemorial. Various scientific studies are conducted in the recent past to identify the stimuli causing violent behavior among the masses and to achieve the target of cloud data protection. Given the inherent ambiguity or indeterminacy in human behaviour, this study in the area of violence detection appears to be effective, as it finds a variety of stimuli and character qualities that contribute to violent conduct among masses. This uncertainty of traits causing violence can easily be seen in surveillance data present over the cloud and also from the data collected using academic research. Therefore, for the purpose of identifying violent behavior we have considered the factors (data) from existing research and from data over clouds. The factors that lead to violent behavior and are identified by algorithms running over clouds are termed as determinate or certain factors. The factors that were not considered and least identified by the cloud algorithms and given less importance are termed indeterminate factors or uncertain factors. The indeterminate factors are also considered based on the expert’s opinion where the experts are not in a condition to provide a clear stance or when they are neutral in their opinion. Tests are performed using Neutrosophic Cognitive Maps (NCMs) to model the violent behavior taking into consideration both determinate and indeterminate factors. Earlier these tests were performed using Fuzzy Cognitive Maps (FCMs) where indeterminate or uncertain factors were not considered. Therefore, we provide a brief comparison between NCMs and FCMs and show how effective NCMs are when we need to consider the uncertainty of concepts while carrying out tests for identifying violent behavior. Later results are obtained by forming a Neutrosophic adjacency matrix which is evaluated using the concepts of linear algebra. The obtained results in the form of
1 ∗ n vector (1 I I I I 1 I 1 I I I I I I I I I I I ) clearly shows the presence of indeterminate factor ‘I’ in the vector which was absent in earlier models when designed using FCMs. This shows how these indeterminate or uncertain factors play a significant role in cultivating violent behavior which was not shown in the previous study. The study is significant since it takes into account factors from cloud data, experts’ opinions, and also from literature, and shows how these factors are taken into consideration at the data level itself so that they will not impact the modeling stage, and machine learning algorithms will perform well because uncertain and indeterminate information is taken care of at training phase itself. Hence uncertainty could be reduced in machine learning algorithms and in the overall recognition of violent behavior.
Collapse
|
5
|
Identifying human activities in megastores through postural data to monitor shoplifting events. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-08028-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
6
|
Leveraging a Neuroevolutionary Approach for Classifying Violent Behavior in Video. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:1279945. [PMID: 35875734 PMCID: PMC9307330 DOI: 10.1155/2022/1279945] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 06/22/2022] [Accepted: 07/04/2022] [Indexed: 11/17/2022]
Abstract
Security has become a critical issue for complex and expensive systems and day-to-day situations. In this regard, the analysis of surveillance cameras is a critical issue usually restricted to the number of people devoted to such a task, their knowledge and judgment. Nonetheless, different approaches have arisen to automate this task in recent years. These approaches are mainly based on machine learning and benefit from developing neural networks capable of extracting underlying information from input videos. Despite how competent those networks have proved to be, developers must face the challenging task of defining both the architecture and hyperparameters that allow such networks to work adequately and optimize the use of computational resources. In short, this work proposes a model that generates, through a genetic algorithm, neural networks for behavior classification within videos. Two types of neural networks evolved as part of this work, shallow and deep, which are structured on dense and 3D convolutional layers. Each network requires a particular type of input data: the evolution of the pose of people in the video and video sequences, respectively. Shallow neural networks use a direct encoding approach to map each part of the chromosome into a phenotype. In contrast, deep neural networks use indirect encoding, blueprints representing entire networks, and modules to depict layers and their connections. Our approach obtained relevant results when tested on the Kranok-NV dataset and evaluated with standard metrics used for similar classification tasks.
Collapse
|
7
|
Abstract
AbstractEmotional AI is an emerging technology used to make probabilistic predictions about the emotional states of people using data sources, such as facial (micro)-movements, body language, vocal tone or the choice of words. The performance of such systems is heavily debated and so are the underlying scientific methods that serve as the basis for many such technologies. In this article I will engage with this new technology, and with the debates and literature that surround it. Working at the intersection of criminology, policing, surveillance and the study of emotional AI this paper explores and offers a framework of understanding the various issues that these technologies present particularly to liberal democracies. I argue that these technologies should not be deployed within public spaces because there is only a very weak evidence-base as to their effectiveness in a policing and security context, and even more importantly represent a major intrusion to people’s private lives and also represent a worrying extension of policing power because of the possibility that intentions and attitudes may be inferred. Further to this, the danger in the use of such invasive surveillance for the purpose of policing and crime prevention in urban spaces is that it potentially leads to a highly regulated and control-oriented society. I argue that emotion recognition has severe impacts on the right to the city by not only undertaking surveillance of existing situations but also making inferences and probabilistic predictions about future events as well as emotions and intentions.
Collapse
|
8
|
ESAR, An Expert Shoplifting Activity Recognition System. CYBERNETICS AND INFORMATION TECHNOLOGIES 2022. [DOI: 10.2478/cait-2022-0012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Abstract
Shoplifting is a troubling and pervasive aspect of consumers, causing great losses to retailers. It is the theft of goods from the stores/shops, usually by hiding the store item either in the pocket or in carrier bag and leaving without any payment. Revenue loss is the most direct financial effect of shoplifting. Therefore, this article introduces an Expert Shoplifting Activity Recognition (ESAR) system to reduce shoplifting incidents in stores/shops. The system being proposed seamlessly examines each frame in video footage and alerts security personnel when shoplifting occurs. It uses dual-stream convolutional neural network to extract appearance and salient motion features in the video sequences. Here, optical flow and gradient components are used to extract salient motion features related to shoplifting movement in the video sequence. Long Short Term Memory (LSTM) based deep learner is modeled to learn the extracted features in the time domain for distinguishing person actions (i.e., normal and shoplifting). Analyzing the model behavior for diverse modeling environments is an added contribution of this paper. A synthesized shoplifting dataset is used here for experimentations. The experimental outcomes show that the proposed approach attains better consequences up to 90.26% detection accuracy compared to the other prevalent approaches.
Collapse
|
9
|
Reid S, Coleman S, Vance P, Kerr D, O’Neill S. Using Social Signals to Predict Shoplifting: A Transparent Approach to a Sensitive Activity Analysis Problem. SENSORS (BASEL, SWITZERLAND) 2021; 21:6812. [PMID: 34696025 PMCID: PMC8541608 DOI: 10.3390/s21206812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 10/04/2021] [Accepted: 10/08/2021] [Indexed: 11/16/2022]
Abstract
Retail shoplifting is one of the most prevalent forms of theft and has accounted for over one billion GBP in losses for UK retailers in 2018. An automated approach to detecting behaviours associated with shoplifting using surveillance footage could help reduce these losses. Until recently, most state-of-the-art vision-based approaches to this problem have relied heavily on the use of black box deep learning models. While these models have been shown to achieve very high accuracy, this lack of understanding on how decisions are made raises concerns about potential bias in the models. This limits the ability of retailers to implement these solutions, as several high-profile legal cases have recently ruled that evidence taken from these black box methods is inadmissible in court. There is an urgent need to develop models which can achieve high accuracy while providing the necessary transparency. One way to alleviate this problem is through the use of social signal processing to add a layer of understanding in the development of transparent models for this task. To this end, we present a social signal processing model for the problem of shoplifting prediction which has been trained and validated using a novel dataset of manually annotated shoplifting videos. The resulting model provides a high degree of understanding and achieves accuracy comparable with current state of the art black box methods.
Collapse
Affiliation(s)
- Shane Reid
- School of Computing, Engineering and Intelligent Systems, Ulster University, Derry/Londonderry BT48 7JL, UK; (S.C.); (P.V.); (D.K.); (S.O.)
| | | | | | | | | |
Collapse
|