1
|
Sun Y, Min X, Duan H, Zhai G. How is Visual Attention Influenced by Text Guidance? Database and Model. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:5392-5407. [PMID: 39312416 DOI: 10.1109/tip.2024.3461956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
The analysis and prediction of visual attention have long been crucial tasks in the fields of computer vision and image processing. In practical applications, images are generally accompanied by various text descriptions, however, few studies have explored the influence of text descriptions on visual attention, let alone developed visual saliency prediction models considering text guidance. In this paper, we conduct a comprehensive study on text-guided image saliency (TIS) from both subjective and objective perspectives. Specifically, we construct a TIS database named SJTU-TIS, which includes 1200 text-image pairs and the corresponding collected eye-tracking data. Based on the established SJTU-TIS database, we analyze the influence of various text descriptions on visual attention. Then, to facilitate the development of saliency prediction models considering text influence, we construct a benchmark for the established SJTU-TIS database using state-of-the-art saliency models. Finally, considering the effect of text descriptions on visual attention, while most existing saliency models ignore this impact, we further propose a text-guided saliency (TGSal) prediction model, which extracts and integrates both image features and text features to predict the image saliency under various text-description conditions. Our proposed model significantly outperforms the state-of-the-art saliency models on both the SJTU-TIS database and the pure image saliency databases in terms of various evaluation metrics. The SJTU-TIS database and the code of the proposed TGSal model will be released at: https://github.com/IntMeGroup/TGSal.
Collapse
|
2
|
Tian Y, Sharma A, Mehta S, Kaushal S, Liebmann JM, Cioffi GA, Thakoor KA. Automated Identification of Clinically Relevant Regions in Glaucoma OCT Reports Using Expert Eye Tracking Data and Deep Learning. Transl Vis Sci Technol 2024; 13:24. [PMID: 39405074 PMCID: PMC11482640 DOI: 10.1167/tvst.13.10.24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 09/04/2024] [Indexed: 10/19/2024] Open
Abstract
Purpose To propose a deep learning-based approach for predicting the most-fixated regions on optical coherence tomography (OCT) reports using eye tracking data of ophthalmologists, assisting them in finding medically salient image regions. Methods We collected eye tracking data of ophthalmology residents, fellows, and faculty as they viewed OCT reports to detect glaucoma. We used a U-Net model as the deep learning backbone and quantized eye tracking coordinates by dividing the input report into an 11 × 11 grid. The model was trained to predict the grids on which fixations would land in unseen OCT reports. We investigated the contribution of different variables, including the viewer's level of expertise, model architecture, and number of eye gaze patterns included in training. Results Our approach predicted most-fixated regions in OCT reports with precision of 0.723, recall of 0.562, and f1-score of 0.609. We found that using a grid-based eye tracking structure enabled efficient training and using a U-Net backbone led to the best performance. Conclusions Our approach has the potential to assist ophthalmologists in diagnosing glaucoma by predicting the most medically salient regions on OCT reports. Our study suggests the value of eye tracking in guiding deep learning algorithms toward informative regions when experts may not be accessible. Translational Relevance By suggesting important OCT report regions for a glaucoma diagnosis, our model could aid in medical education and serve as a precursor for self-supervised deep learning approaches to expedite early detection of irreversible vision loss owing to glaucoma.
Collapse
Affiliation(s)
- Ye Tian
- Department of Biomedical Engineering, Columbia University, New York, New York, USA
- Artificial Intelligence for Vision Science Laboratory, Edward S. Harkness Eye Institute, Department of Ophthalmology, Columbia University Irving Medical Center, New York, New York, USA
| | - Anurag Sharma
- Department of Biomedical Engineering, Columbia University, New York, New York, USA
- Artificial Intelligence for Vision Science Laboratory, Edward S. Harkness Eye Institute, Department of Ophthalmology, Columbia University Irving Medical Center, New York, New York, USA
| | - Shubh Mehta
- Department of Biomedical Engineering, Columbia University, New York, New York, USA
| | - Shubham Kaushal
- Data Science Institute, Columbia University, New York, New York, USA
- Artificial Intelligence for Vision Science Laboratory, Edward S. Harkness Eye Institute, Department of Ophthalmology, Columbia University Irving Medical Center, New York, New York, USA
| | - Jeffrey M. Liebmann
- Bernard and Shirlee Brown Glaucoma Research Laboratory, Edward S. Harkness Eye Institute, Department of Ophthalmology, Columbia University Irving Medical Center, New York, New York, USA
| | - George A. Cioffi
- Bernard and Shirlee Brown Glaucoma Research Laboratory, Edward S. Harkness Eye Institute, Department of Ophthalmology, Columbia University Irving Medical Center, New York, New York, USA
| | - Kaveri A. Thakoor
- Department of Biomedical Engineering, Columbia University, New York, New York, USA
- Data Science Institute, Columbia University, New York, New York, USA
- Artificial Intelligence for Vision Science Laboratory, Edward S. Harkness Eye Institute, Department of Ophthalmology, Columbia University Irving Medical Center, New York, New York, USA
- Bernard and Shirlee Brown Glaucoma Research Laboratory, Edward S. Harkness Eye Institute, Department of Ophthalmology, Columbia University Irving Medical Center, New York, New York, USA
- Department of Computer Science, Columbia University, New York, New York, USA
| |
Collapse
|
3
|
Dabouei A, Mishra I, Kapur K, Cao C, Bridges AA, Xu M. Deep Video Analysis for Bacteria Genotype Prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.16.613253. [PMID: 39345538 PMCID: PMC11429917 DOI: 10.1101/2024.09.16.613253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Genetic modification of microbes is central to many biotechnology fields, such as industrial microbiology, bioproduction, and drug discovery. Understanding how specific genetic modifications influence observable bacterial behaviors is crucial for advancing these fields. In this study, we propose a supervised model to classify bacteria harboring single gene modifications to draw connections between phenotype and genotype. In particular, we demonstrate that the spatiotemporal patterns of Vibrio cholerae growth, recorded in terms of low-resolution bright-field microscopy videos, are highly predictive of the genotype class. Additionally, we introduce a weakly supervised approach to identify key moments in culture growth that significantly contribute to prediction accuracy. By focusing on the temporal expressions of bacterial behavior, our findings offer valuable insights into the underlying mechanisms and developmental stages by which specific genes control observable phenotypes. This research opens new avenues for automating the analysis of phenotypes, with potential applications for drug discovery, disease management, etc. Furthermore, this work highlights the potential of using machine learning techniques to explore the functional roles of specific genes using a low-resolution light microscope.
Collapse
|
4
|
Chen Y, Li X, Lv N, He Z, Wu B. Automatic detection method for tobacco beetles combining multi-scale global residual feature pyramid network and dual-path deformable attention. Sci Rep 2024; 14:4862. [PMID: 38418868 PMCID: PMC10902385 DOI: 10.1038/s41598-024-55347-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Accepted: 02/22/2024] [Indexed: 03/02/2024] Open
Abstract
Aiming at the problems of identifying storage pest tobacco pest beetles from images that have few object pixels and considerable image noise, and therefore suffer from lack of information and identifiable features, this paper proposes an automatic monitoring method of tobacco beetle based on Multi-scale Global residual Feature Pyramid Network and Dual-path Deformable Attention (MGrFPN-DDrGAM). Firstly, a Multi-scale Global residual Feature Pyramid Network (MGrFPN) is constructed to obtain rich high-level semantic features and more complete information on low-level features to reduce missed detection; Then, a Dual-path Deformable receptive field Guided Attention Module (DDrGAM) is designed to establish long-range channel dependence, guide the effective fusion of features and improve the localization accuracy of tobacco beetles by fitting the spatial geometric deformation features of and capturing the spatial information of feature maps with different scales to enrich the feature information in the channel and spatial. Finally, to simulate a real scene, a multi-scene tobacco beetle dataset is created. The dataset includes 28,080 images and manually labeled tobacco beetle objects. The experimental results show that under the framework of the Faster R-CNN algorithm, the detection precision and recall rate of this method can reach 91.4% and 98.4% when the intersection ratio (IoU) is 0.5. Compared with Faster R-CNN and FPN, when the intersection ratio (IoU) is 0.7, the detection precision is improved by 32.9% and 6.9%, respectively. The proposed method is superior to the current mainstream methods.
Collapse
Affiliation(s)
- Yuling Chen
- School of Information Engineering, Southwest University of Science and Technology, Mianyang, 621010, Sichuan, China
- Mianyang Teachers' College, Mianyang, 621000, Sichuan, China
| | - Xiaoxia Li
- School of Information Engineering, Southwest University of Science and Technology, Mianyang, 621010, Sichuan, China
- Robot Technology Used for Special Environment Key Laboratory of Sichuan Province, Mianyang, 621010, Sichuan, China
| | - Nianzu Lv
- Xinjiang Institute of Technology, Aksu, 13558, Xinjiang, China
| | - Zhenxiang He
- School of Information Engineering, Southwest University of Science and Technology, Mianyang, 621010, Sichuan, China
- Tianfu College of Southwest University of Finance and Economics, Mianyang, 621000, Sichuan, China
| | - Bin Wu
- School of Information Engineering, Southwest University of Science and Technology, Mianyang, 621010, Sichuan, China.
- Robot Technology Used for Special Environment Key Laboratory of Sichuan Province, Mianyang, 621010, Sichuan, China.
| |
Collapse
|
5
|
Zang M, Mukund P, Forsyth B, Laine AF, Thakoor KA. Predicting Clinician Fixations on Glaucoma OCT Reports via CNN-Based Saliency Prediction Methods. IEEE OPEN JOURNAL OF ENGINEERING IN MEDICINE AND BIOLOGY 2024; 5:191-197. [PMID: 38606397 PMCID: PMC11008801 DOI: 10.1109/ojemb.2024.3367492] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 01/26/2024] [Accepted: 02/15/2024] [Indexed: 04/13/2024] Open
Abstract
Goal: To predict physician fixations specifically on ophthalmology optical coherence tomography (OCT) reports from eye tracking data using CNN based saliency prediction methods in order to aid in the education of ophthalmologists and ophthalmologists-in-training. Methods: Fifteen ophthalmologists were recruited to each examine 20 randomly selected OCT reports and evaluate the likelihood of glaucoma for each report on a scale of 0-100. Eye movements were collected using a Pupil Labs Core eye-tracker. Fixation heat maps were generated using fixation data. Results: A model trained with traditional saliency mapping resulted in a correlation coefficient (CC) value of 0.208, a Normalized Scanpath Saliency (NSS) value of 0.8172, a Kullback-Leibler (KLD) value of 2.573, and a Structural Similarity Index (SSIM) of 0.169. Conclusions: The TranSalNet model was able to predict fixations within certain regions of the OCT report with reasonable accuracy, but more data is needed to improve model accuracy. Future steps include increasing data collection, improving quality of data, and modifying the model architecture.
Collapse
|