1
|
Lu H, Ehwerhemuepha L, Rakovski C. A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance. BMC Med Res Methodol 2022; 22:181. [PMID: 35780100 PMCID: PMC9250736 DOI: 10.1186/s12874-022-01665-y] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2021] [Accepted: 05/23/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Discharge medical notes written by physicians contain important information about the health condition of patients. Many deep learning algorithms have been successfully applied to extract important information from unstructured medical notes data that can entail subsequent actionable results in the medical domain. This study aims to explore the model performance of various deep learning algorithms in text classification tasks on medical notes with respect to different disease class imbalance scenarios. METHODS In this study, we employed seven artificial intelligence models, a CNN (Convolutional Neural Network), a Transformer encoder, a pretrained BERT (Bidirectional Encoder Representations from Transformers), and four typical sequence neural networks models, namely, RNN (Recurrent Neural Network), GRU (Gated Recurrent Unit), LSTM (Long Short-Term Memory), and Bi-LSTM (Bi-directional Long Short-Term Memory) to classify the presence or absence of 16 disease conditions from patients' discharge summary notes. We analyzed this question as a composition of 16 binary separate classification problems. The model performance of the seven models on each of the 16 datasets with various levels of imbalance between classes were compared in terms of AUC-ROC (Area Under the Curve of the Receiver Operating Characteristic), AUC-PR (Area Under the Curve of Precision and Recall), F1 Score, and Balanced Accuracy as well as the training time. The model performances were also compared in combination with different word embedding approaches (GloVe, BioWordVec, and no pre-trained word embeddings). RESULTS The analyses of these 16 binary classification problems showed that the Transformer encoder model performs the best in nearly all scenarios. In addition, when the disease prevalence is close to or greater than 50%, the Convolutional Neural Network model achieved a comparable performance to the Transformer encoder, and its training time was 17.6% shorter than the second fastest model, 91.3% shorter than the Transformer encoder, and 94.7% shorter than the pre-trained BERT-Base model. The BioWordVec embeddings slightly improved the performance of the Bi-LSTM model in most disease prevalence scenarios, while the CNN model performed better without pre-trained word embeddings. In addition, the training time was significantly reduced with the GloVe embeddings for all models. CONCLUSIONS For classification tasks on medical notes, Transformer encoders are the best choice if the computation resource is not an issue. Otherwise, when the classes are relatively balanced, CNNs are a leading candidate because of their competitive performance and computational efficiency.
Collapse
|
|
3 |
21 |
2
|
Wang S, Wu J. Patch-Transformer Network: A Wearable-Sensor-Based Fall Detection Method. SENSORS (BASEL, SWITZERLAND) 2023; 23:6360. [PMID: 37514654 PMCID: PMC10384835 DOI: 10.3390/s23146360] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 07/10/2023] [Accepted: 07/11/2023] [Indexed: 07/30/2023]
Abstract
Falls can easily cause major harm to the health of the elderly, and timely detection can avoid further injuries. To detect the occurrence of falls in time, we propose a new method called Patch-Transformer Network (PTN) wearable-sensor-based fall detection algorithm. The neural network includes a convolution layer, a Transformer encoding layer, and a linear classification layer. The convolution layer is used to extract local features and project them into feature matrices. After adding positional coding information, the global features of falls are learned through the multi-head self-attention mechanism in the Transformer encoding layer. Global average pooling (GAP) is used to strengthen the correlation between features and categories. The final classification results are provided by the linear layer. The accuracy of the model obtained on the public available datasets SisFall and UnMib SHAR is 99.86% and 99.14%, respectively. The network model has fewer parameters and lower complexity, with detection times of 0.004 s and 0.001 s on the two datasets. Therefore, our proposed method can timely and accurately detect the occurrence of falls, which is important for protecting the lives of the elderly.
Collapse
|
|
2 |
1 |
3
|
Bedi P, Rani S, Gupta B, Bhasin V, Gole P. EpiBrCan-Lite: A lightweight deep learning model for breast cancer subtype classification using epigenomic data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 260:108553. [PMID: 39667144 DOI: 10.1016/j.cmpb.2024.108553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Revised: 11/14/2024] [Accepted: 12/03/2024] [Indexed: 12/14/2024]
Abstract
BACKGROUND AND OBJECTIVES Early breast cancer subtypes classification improves the survival rate as it facilitates prognosis of the patient. In literature this problem was prominently solved by various Machine Learning and Deep Learning techniques. However, these studies have three major shortcomings: huge Trainable Weight Parameters (TWP), suffer from low performance and class imbalance problem. METHODS This paper proposes a lightweight model named EpiBrCan-Lite for classifying breast cancer subtypes using DNA methylation data. This model encompasses three blocks namely Data Encoding, TransGRU, and Classification blocks. In Data Encoding block, the input features are encoded into equal sized chunks and then passed down to TransGRU block which is a modified version of traditional Transformer Encoder (TE). In TransGRU block, MLP module of traditional TE is replaced by GRU module, consisting of two GRU layers to reduce TWP and capture the long-range dependencies of input feature data. Furthermore, output of TransGRU block is passed to Classification block for classifying breast cancer into their subtypes. RESULTS The proposed model is validated using Accuracy, Precision, Recall, F1-score, FPR, and FNR metrics on TCGA breast cancer dataset. This dataset suffers from the class imbalance problem which is mitigated using Synthetic Minority Oversampling Technique (SMOTE). Experimentation results demonstrate that EpiBrCan-Lite model attained 95.85 % accuracy, 95.96 % recall, 95.85 % precision, 95.90 % F1-score, 1.03 % FPR, and 4.12 % FNR despite of utilizing only 1/1500 of TWP than other state-of-the-art models. CONCLUSION EpiBrCan-Lite model is efficiently classifying breast cancer subtypes, and being lightweight, it is suitable to be deployed on low computational powered devices.
Collapse
|
|
1 |
|
4
|
Chen X. A novel transformer-based DL model enhanced by position-sensitive attention and gated hierarchical LSTM for aero-engine RUL prediction. Sci Rep 2024; 14:10061. [PMID: 38698017 PMCID: PMC11526133 DOI: 10.1038/s41598-024-59095-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 04/08/2024] [Indexed: 05/05/2024] Open
Abstract
Accurate prediction of remaining useful life (RUL) for aircraft engines is essential for proactive maintenance and safety assurance. However, existing methods such as physics-based models, classical recurrent neural networks, and convolutional neural networks face limitations in capturing long-term dependencies and modeling complex degradation patterns. In this study, we propose a novel deep-learning model based on the Transformer architecture to address these limitations. Specifically, to address the issue of insensitivity to local context in the attention mechanism employed by the Transformer encoder, we introduce a position-sensitive self-attention (PSA) unit to enhance the model's ability to incorporate local context by attending to the positional relationships of the input data at each time step. Additionally, a gated hierarchical long short-term memory network (GHLSTM) is designed to perform regression prediction at different time scales on the latent features, thereby improving the accuracy of RUL estimation for mechanical equipment. Experiments on the C-MAPSS dataset demonstrate that the proposed model outperforms existing methods in RUL prediction, showcasing its effectiveness in modeling complex degradation patterns and long-term dependencies.
Collapse
|
research-article |
1 |
|
5
|
Sharma P, Sharma B, Yadav DP, Thakral D, Webber JL. Bladder lesion detection using EfficientNet and hybrid attention transformer through attention transformation. Sci Rep 2025; 15:18042. [PMID: 40410301 PMCID: PMC12102296 DOI: 10.1038/s41598-025-02767-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2025] [Accepted: 05/15/2025] [Indexed: 05/25/2025] Open
Abstract
Bladder cancer diagnosis is a challenging task because of its intricacy and variation of tumor features. Moreover, morphological similarities of the cancerous cells make manual diagnosis time-consuming. Recently, machine learning and deep learning methods have been utilized to diagnose bladder cancer. However, manual feature requirements for machine learning and the high volume of data for deep learning make them less reliable for real-time application. This study developed a hybrid model using CNN (Convolutional Neural Network) and less attention-based ViT (Vision Transformer) for bladder lesion diagnosis. Our hybrid model contains two blocks of the inceptionV3 to extract spatial features. Furthermore, the global co-relation of the features is achieved using hybrid attention modules incorporated in the ViT encoder. The experimental evaluation of the model on a dataset consisting of 17,540 endoscopic images achieved an average accuracy, precision and F1-score of 97.73%, 97.21% and 96.86%, respectively, using a 5-fold cross-validation strategy. We compared the results of the proposed method with CNN and ViT-based methods under the same experimental condition, and we achieved much better performance than our counterparts.
Collapse
|
research-article |
1 |
|
6
|
Hong J, Lee J, Choi D, Jung J. Depression level prediction via textual and acoustic analysis. Comput Biol Med 2025; 190:110009. [PMID: 40157317 DOI: 10.1016/j.compbiomed.2025.110009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2024] [Revised: 03/02/2025] [Accepted: 03/06/2025] [Indexed: 04/01/2025]
Abstract
Extensive research on automatic depression diagnosis has utilized video data to capture related cues, but data collection is challenging because of privacy concerns. By contrast, voice data offer a less-intrusive assessment method and can be analyzed for features such as simple tones, the expression of negative emotions, and a focus on oneself. Recent advancements in multimodal depression-level prediction using speech and text data have gained traction, but most studies overlook the temporal alignment of these modalities, limiting their analysis of the interaction between speech content and intonation. To overcome these limitations, this study introduces timestamp-integrated multimodal encoding for depression (TIMEX-D) which synchronizes the acoustic features of human speech with corresponding text data to predict depression levels on the basis of their relationship. TIMEX-D comprises three main components: a timestamp extraction block that extracts timestamps from speech and text, a multimodal encoding block that extends positional encoding from transformers to mimic human speech recognition, and a depression analysis block that predicts depression levels while reducing model complexity compared with existing transformers. In experiments using the DAIC-WOZ and EDAIC datasets, TIMEX-D achieved accuracies of 99.17 % and 99.81 %, respectively, outperforming previous methods by approximately 13 %. The effectiveness of TIMEX-D in predicting depression levels can enhance mental health diagnostics and monitoring across various contexts.
Collapse
|
|
1 |
|
7
|
Zhou R, Fan J, Li S, Zeng W, Chen Y, Zheng X, Chen H, Liao J. LVPocket: integrated 3D global-local information to protein binding pockets prediction with transfer learning of protein structure classification. J Cheminform 2024; 16:79. [PMID: 38972994 PMCID: PMC11229186 DOI: 10.1186/s13321-024-00871-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 06/12/2024] [Indexed: 07/09/2024] Open
Abstract
BACKGROUND Previous deep learning methods for predicting protein binding pockets mainly employed 3D convolution, yet an abundance of convolution operations may lead the model to excessively prioritize local information, thus overlooking global information. Moreover, it is essential for us to account for the influence of diverse protein folding structural classes. Because proteins classified differently structurally exhibit varying biological functions, whereas those within the same structural class share similar functional attributes. RESULTS We proposed LVPocket, a novel method that synergistically captures both local and global information of protein structure through the integration of Transformer encoders, which help the model achieve better performance in binding pockets prediction. And then we tailored prediction models for data of four distinct structural classes of proteins using the transfer learning. The four fine-tuned models were trained on the baseline LVPocket model which was trained on the sc-PDB dataset. LVPocket exhibits superior performance on three independent datasets compared to current state-of-the-art methods. Additionally, the fine-tuned model outperforms the baseline model in terms of performance. SCIENTIFIC CONTRIBUTION We present a novel model structure for predicting protein binding pockets that provides a solution for relying on extensive convolutional computation while neglecting global information about protein structures. Furthermore, we tackle the impact of different protein folding structures on binding pocket prediction tasks through the application of transfer learning methods.
Collapse
|
research-article |
1 |
|
8
|
Yang T, Wang Y, He Y. TEC-miTarget: enhancing microRNA target prediction based on deep learning of ribonucleic acid sequences. BMC Bioinformatics 2024; 25:159. [PMID: 38643080 PMCID: PMC11032603 DOI: 10.1186/s12859-024-05780-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 04/12/2024] [Indexed: 04/22/2024] Open
Abstract
BACKGROUND MicroRNAs play a critical role in regulating gene expression by binding to specific target sites within gene transcripts, making the identification of microRNA targets a prominent focus of research. Conventional experimental methods for identifying microRNA targets are both time-consuming and expensive, prompting the development of computational tools for target prediction. However, the existing computational tools exhibit limited performance in meeting the demands of practical applications, highlighting the need to improve the performance of microRNA target prediction models. RESULTS In this paper, we utilize the most popular natural language processing and computer vision technologies to propose a novel approach, called TEC-miTarget, for microRNA target prediction based on transformer encoder and convolutional neural networks. TEC-miTarget treats RNA sequences as a natural language and encodes them using a transformer encoder, a widely used encoder in natural language processing. It then combines the representations of a pair of microRNA and its candidate target site sequences into a contact map, which is a three-dimensional array similar to a multi-channel image. Therefore, the contact map's features are extracted using a four-layer convolutional neural network, enabling the prediction of interactions between microRNA and its candidate target sites. We applied a series of comparative experiments to demonstrate that TEC-miTarget significantly improves microRNA target prediction, compared with existing state-of-the-art models. Our approach is the first approach to perform comparisons with other approaches at both sequence and transcript levels. Furthermore, it is the first approach compared with both deep learning-based and seed-match-based methods. We first compared TEC-miTarget's performance with approaches at the sequence level, and our approach delivers substantial improvements in performance using the same datasets and evaluation metrics. Moreover, we utilized TEC-miTarget to predict microRNA targets in long mRNA sequences, which involves two steps: selecting candidate target site sequences and applying sequence-level predictions. We finally showed that TEC-miTarget outperforms other approaches at the transcript level, including the popular seed match methods widely used in previous years. CONCLUSIONS We propose a novel approach for predicting microRNA targets at both sequence and transcript levels, and demonstrate that our approach outperforms other methods based on deep learning or seed match. We also provide our approach as an easy-to-use software, TEC-miTarget, at https://github.com/tingpeng17/TEC-miTarget . Our results provide new perspectives for microRNA target prediction.
Collapse
|
research-article |
1 |
|
9
|
Xu K, Wang M, Zou X, Liu J, Wei A, Chen J, Tang C. HSTrans: Homogeneous substructures transformer for predicting frequencies of drug-side effects. Neural Netw 2025; 181:106779. [PMID: 39488108 DOI: 10.1016/j.neunet.2024.106779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 08/29/2024] [Accepted: 10/01/2024] [Indexed: 11/04/2024]
Abstract
Identifying the frequencies of drug-side effects is crucial for assessing drug risk-benefit. However, accurately determining these frequencies remains challenging due to the limitations of time and scale in clinical randomized controlled trials. As a result, several computational methods have been proposed to address these issues. Nonetheless, two primary problems still persist. Firstly, most of these methods face challenges in generating accurate predictions for novel drugs, as they heavily depend on the interaction graph between drugs and side effects (SEs) within their modeling framework. Secondly, some previous methods often simply concatenate the features of drugs and SEs, which fails to effectively capture their underlying association. In this work, we present HSTrans, a novel approach that treats drugs and SEs as sets of substructures, leveraging a transformer encoder for unified substructure embedding and incorporating an interaction module for association capture. Specifically, HSTrans extracts drug substructures through a specialized algorithm and identifies effective substructures for each SE by employing an indicator that measures the importance of each substructure and SE. Additionally, HSTrans applies convolutional neural network (CNN) in the interaction module to capture complex relationships between drugs and SEs. Experimental results on datasets from Galeano et al.'s study demonstrate that the proposed method outperforms other state-of-the-art approaches. The demo codes for HSTrans are available at https://github.com/Dtdtxuky/HSTrans/tree/master.
Collapse
|
|
1 |
|
10
|
Cheng J, Sun K. Heart Sound Classification Network Based on Convolution and Transformer. SENSORS (BASEL, SWITZERLAND) 2023; 23:8168. [PMID: 37836998 PMCID: PMC10575162 DOI: 10.3390/s23198168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Revised: 08/31/2023] [Accepted: 09/19/2023] [Indexed: 10/15/2023]
Abstract
Electronic auscultation is vital for doctors to detect symptoms and signs of cardiovascular diseases (CVDs), significantly impacting human health. Although progress has been made in heart sound classification, most existing methods require precise segmentation and feature extraction of heart sound signals before classification. To address this, we introduce an innovative approach for heart sound classification. Our method, named Convolution and Transformer Encoder Neural Network (CTENN), simplifies preprocessing, automatically extracting features using a combination of a one-dimensional convolution (1D-Conv) module and a Transformer encoder. Experimental results showcase the superiority of our proposed method in both binary and multi-class tasks, achieving remarkable accuracies of 96.4%, 99.7%, and 95.7% across three distinct datasets compared with that of similar approaches. This advancement holds promise for enhancing CVD diagnosis and treatment.
Collapse
|
research-article |
2 |
|
11
|
Benzorgat N, Xia K, Benzorgat MNE. Enhancing brain tumor MRI classification with an ensemble of deep learning models and transformer integration. PeerJ Comput Sci 2024; 10:e2425. [PMID: 39650528 PMCID: PMC11623201 DOI: 10.7717/peerj-cs.2425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Accepted: 09/26/2024] [Indexed: 12/11/2024]
Abstract
Brain tumors are widely recognized as the primary cause of cancer-related mortality globally, necessitating precise detection to enhance patient survival rates. The early identification of brain tumor is presented with significant challenges in the healthcare domain, necessitating the implementation of precise and efficient diagnostic methodologies. The manual identification and analysis of extensive MRI data are presented as a challenging and laborious task, compounded by the importance of early tumor detection in reducing mortality rates. Prompt initiation of treatment hinges upon identifying the specific tumor type in patients, emphasizing the urgency for a dependable deep learning methodology for precise diagnosis. In this research, a hybrid model is presented which integrates the strengths of both transfer learning and the transformer encoder mechanism. After the performance evaluation of the efficacy of six pre-existing deep learning model, both individually and in combination, it was determined that an ensemble of three pretrained models achieved the highest accuracy. This ensemble, comprising DenseNet201, GoogleNet (InceptionV3), and InceptionResNetV2, is selected as the feature extraction framework for the transformer encoder network. The transformer encoder module integrates a Shifted Window-based Self-Attention mechanism, sequential Self-Attention, with a multilayer perceptron layer (MLP). These experiments were conducted on three publicly available research datasets for evaluation purposes. The Cheng dataset, BT-large-2c, and BT-large-4c dataset, each designed for various classification tasks with differences in sample number, planes, and contrast. The model gives consistent results on all three datasets and reaches an accuracy of 99.34%, 99.16%, and 98.62%, respectively, which are improved compared to other techniques.
Collapse
|
research-article |
1 |
|
12
|
Shahid, Hayat M, Alghamdi W, Akbar S, Raza A, Kadir RA, Sarker MR. pACP-HybDeep: predicting anticancer peptides using binary tree growth based transformer and structural feature encoding with deep-hybrid learning. Sci Rep 2025; 15:565. [PMID: 39747941 PMCID: PMC11695694 DOI: 10.1038/s41598-024-84146-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Accepted: 12/20/2024] [Indexed: 01/04/2025] Open
Abstract
Worldwide, Cancer remains a significant health concern due to its high mortality rates. Despite numerous traditional therapies and wet-laboratory methods for treating cancer-affected cells, these approaches often face limitations, including high costs and substantial side effects. Recently the high selectivity of peptides has garnered significant attention from scientists due to their reliable targeted actions and minimal adverse effects. Furthermore, keeping the significant outcomes of the existing computational models, we propose a highly reliable and effective model namely, pACP-HybDeep for the accurate prediction of anticancer peptides. In this model, training peptides are numerically encoded using an attention-based ProtBERT-BFD encoder to extract semantic features along with CTDT-based structural information. Furthermore, a k-nearest neighbor-based binary tree growth (BTG) algorithm is employed to select an optimal feature set from the multi-perspective vector. The selected feature vector is subsequently trained using a CNN + RNN-based deep learning model. Our proposed pACP-HybDeep model demonstrated a high training accuracy of 95.33%, and an AUC of 0.97. To validate the generalization capabilities of the model, our pACP-HybDeep model achieved accuracies of 94.92%, 92.26%, and 91.16% on independent datasets Ind-S1, Ind-S2, and Ind-S3, respectively. The demonstrated efficacy, and reliability of the pACP-HybDeep model using test datasets establish it as a valuable tool for researchers in academia and pharmaceutical drug design.
Collapse
|
research-article |
1 |
|
13
|
Mathew MP, Elayidom S, Jagathy Raj VP, Abubeker KM. Development of a handheld GPU-assisted DSC-TransNet model for the real-time classification of plant leaf disease using deep learning approach. Sci Rep 2025; 15:3579. [PMID: 39875383 PMCID: PMC11775295 DOI: 10.1038/s41598-024-82629-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2024] [Accepted: 12/06/2024] [Indexed: 01/30/2025] Open
Abstract
In agriculture, promptly and accurately identifying leaf diseases is crucial for sustainable crop production. To address this requirement, this research introduces a hybrid deep learning model that combines the visual geometric group version 19 (VGG19) architecture features with the transformer encoder blocks. This fusion enables the accurate and précised real-time classification of leaf diseases affecting grape, bell pepper, and tomato plants. Incorporating transformer encoder blocks offers enhanced capability in capturing intricate spatial dependencies within leaf images, promising agricultural sustainability and food security. By providing farmers and farming stakeholders with a reliable tool for rapid disease detection, our model facilitates timely intervention and management practices, ultimately leading to improved crop yields and mitigated economic losses. Through extensive comparative analyses on various datasets and filed tests, the proposed depth wise separable convolutional-TransNet (DSC-TransNet) architecture has demonstrated higher performance in terms of accuracy (99.97%), precision (99.94%), recall (99.94), sensitivity (99.94%), F1-score (99.94%), AUC (0.98) for Grpae leaves across different datasets including bell pepper and tomato. Furthermore, including DSC layers enhances the computational efficiency of the model while maintaining expressive power, making it well-suited for real-time agricultural applications. The developed DSC-TransNet model is deployed in NVIDIA Jetson Nano single board computer. This research contributes to advancing the field of automated plant disease classification, addressing critical challenges in modern agriculture and promoting more efficient and sustainable farming practices.
Collapse
|
research-article |
1 |
|
14
|
Li R, Cao R, Zhao Q, Zhao Z. Utilizing a Novel Convolutional Neural Network for Diagnosis and Lesion Delineation in Colorectal Cancer Screening. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2025:10.1007/s10278-025-01396-8. [PMID: 39821781 DOI: 10.1007/s10278-025-01396-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2024] [Revised: 12/10/2024] [Accepted: 12/24/2024] [Indexed: 01/19/2025]
Abstract
Early detection of colorectal cancer is vital for enhancing cure rates and alleviating treatment burdens. Nevertheless, the high demand for screenings coupled with a limited number of endoscopists underscores the necessity for advanced deep learning techniques to improve screening efficiency and accuracy. This study presents an innovative convolutional neural network (CNN) model, trained on 8260 images from screenings conducted at four medical institutions. The model incorporates parallel global and local feature extraction branches and a distinctive classification head, facilitating both cancer classification and the creation of heatmaps that outline cancerous lesion regions. Performance evaluations of the CNN model, measured against five leading models using accuracy, precision, recall, and F1 score, revealed its superior efficacy across these metrics. Furthermore, the heatmaps proved effective in aiding the automatic identification of lesion locations. In summary, this CNN model represents a promising advancement in early colorectal cancer screening, delivering precise, swift diagnostic results and robust interpretability through its automatic lesion highlighting capabilities.
Collapse
|
|
1 |
|