151
|
Wu 吴奕忱 Y, Li 李晟 S. Complexity Matters: Normalization to Prototypical Viewpoint Induces Memory Distortion along the Vertical Axis of Scenes. J Neurosci 2024; 44:e1175232024. [PMID: 38777600 PMCID: PMC11223457 DOI: 10.1523/jneurosci.1175-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 04/24/2024] [Accepted: 05/13/2024] [Indexed: 05/25/2024] Open
Abstract
Scene memory is prone to systematic distortions potentially arising from experience with the external world. Boundary transformation, a well-known memory distortion effect along the near-far axis of the three-dimensional space, represents the observer's erroneous recall of scenes' viewing distance. Researchers argued that normalization to the prototypical viewpoint with the high-probability viewing distance influenced this phenomenon. Herein, we hypothesized that the prototypical viewpoint also exists in the vertical angle of view (AOV) dimension and could cause memory distortion along scenes' vertical axis. Human subjects of both sexes were recruited to test this hypothesis, and two behavioral experiments were conducted, revealing a systematic memory distortion in the vertical AOV in both the forced choice (n = 79) and free adjustment (n = 30) tasks. Furthermore, the regression analysis implied that the complexity information asymmetry in scenes' vertical axis and the independent subjective AOV ratings from a large set of online participants (n = 1,208) could jointly predict AOV biases. Furthermore, in a functional magnetic resonance imaging experiment (n = 24), we demonstrated the involvement of areas in the ventral visual pathway (V3/V4, PPA, and OPA) in AOV bias judgment. Additionally, in a magnetoencephalography experiment (n = 20), we could significantly decode the subjects' AOV bias judgments ∼140 ms after scene onset and the low-level visual complexity information around the similar temporal interval. These findings suggest that AOV bias is driven by the normalization process and associated with the neural activities in the early stage of scene processing.
Collapse
Affiliation(s)
- Yichen Wu 吴奕忱
- School of Psychological and Cognitive Sciences, Peking University, Beijing 100871, China
- Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing 100871, China
- PKU-IDG/McGovern Institute for Brain Research, Peking University, Beijing 100871, China
- National Key Laboratory of General Artificial Intelligence, Peking University, Beijing 100871, China
| | - Sheng Li 李晟
- School of Psychological and Cognitive Sciences, Peking University, Beijing 100871, China
- Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing 100871, China
- PKU-IDG/McGovern Institute for Brain Research, Peking University, Beijing 100871, China
- National Key Laboratory of General Artificial Intelligence, Peking University, Beijing 100871, China
| |
Collapse
|
152
|
Yang W, Qiu H, Luo X, Xie S. SGK-Net: A Novel Navigation Scene Graph Generation Network. SENSORS (BASEL, SWITZERLAND) 2024; 24:4329. [PMID: 39001108 PMCID: PMC11244408 DOI: 10.3390/s24134329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 06/26/2024] [Accepted: 07/02/2024] [Indexed: 07/16/2024]
Abstract
Scene graphs can enhance the understanding capability of intelligent ships in navigation scenes. However, the complex entity relationships and the presence of significant noise in contextual information within navigation scenes pose challenges for navigation scene graph generation (NSGG). To address these issues, this paper proposes a novel NSGG network named SGK-Net. This network comprises three innovative modules. The Semantic-Guided Multimodal Fusion (SGMF) module utilizes prior information on relationship semantics to fuse multimodal information and construct relationship features, thereby elucidating the relationships between entities and reducing semantic ambiguity caused by complex relationships. The Graph Structure Learning-based Structure Evolution (GSLSE) module, based on graph structure learning, reduces redundancy in relationship features and optimizes the computational complexity in subsequent contextual message passing. The Key Entity Message Passing (KEMP) module takes full advantage of contextual information to refine relationship features, thereby reducing noise interference from non-key nodes. Furthermore, this paper constructs the first Ship Navigation Scene Graph Simulation dataset, named SNSG-Sim, which provides a foundational dataset for the research on ship navigation SGG. Experimental results on the SNSG-sim dataset demonstrate that our method achieves an improvement of 8.31% (R@50) in the PredCls task and 7.94% (R@50) in the SGCls task compared to the baseline method, validating the effectiveness of our method in navigation scene graph generation.
Collapse
Affiliation(s)
- Wenbin Yang
- School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
| | - Hao Qiu
- School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
| | - Xiangfeng Luo
- School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
| | - Shaorong Xie
- School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
| |
Collapse
|
153
|
Zuo X, Li HY, Gao S, Zhang P, Du WR. NALA: a Nesterov accelerated look-ahead optimizer for deep learning. PeerJ Comput Sci 2024; 10:e2167. [PMID: 38983239 PMCID: PMC11232586 DOI: 10.7717/peerj-cs.2167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 06/09/2024] [Indexed: 07/11/2024]
Abstract
Adaptive gradient algorithms have been successfully used in deep learning. Previous work reveals that adaptive gradient algorithms mainly borrow the moving average idea of heavy ball acceleration to estimate the first- and second-order moments of the gradient for accelerating convergence. However, Nesterov acceleration which uses the gradient at extrapolation point can achieve a faster convergence speed than heavy ball acceleration in theory. In this article, a new optimization algorithm which combines adaptive gradient algorithm with Nesterov acceleration by using a look-ahead scheme, called NALA, is proposed for deep learning. NALA iteratively updates two sets of weights, i.e., the 'fast weights' in its inner loop and the 'slow weights' in its outer loop. Concretely, NALA first updates the fast weights k times using Adam optimizer in the inner loop, and then updates the slow weights once in the direction of Nesterov's Accelerated Gradient (NAG) in the outer loop. We compare NALA with several popular optimization algorithms on a range of image classification tasks on public datasets. The experimental results show that NALA can achieve faster convergence and higher accuracy than other popular optimization algorithms.
Collapse
Affiliation(s)
- Xuan Zuo
- School of Automation, Northwestern Polytechnical University, Xi’an, Shaanxi, China
| | - Hui-Yan Li
- China Academy of Aerospace Systems Science and Engineering, Beijing, China
| | - Shan Gao
- School of Automation, Northwestern Polytechnical University, Xi’an, Shaanxi, China
| | - Pu Zhang
- School of Automation and Information Engineering, Xi’an University of Technology, Xi’an, Shaanxi, China
| | - Wan-Ru Du
- China Academy of Aerospace Systems Science and Engineering, Beijing, China
| |
Collapse
|
154
|
Malinverni ES, Abate D, Agapiou A, Stefano FD, Felicetti A, Paolanti M, Pierdicca R, Zingaretti P. SIGNIFICANCE deep learning based platform to fight illicit trafficking of Cultural Heritage goods. Sci Rep 2024; 14:15081. [PMID: 38956250 PMCID: PMC11219783 DOI: 10.1038/s41598-024-65885-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 06/25/2024] [Indexed: 07/04/2024] Open
Abstract
The illicit traffic of cultural goods remains a persistent global challenge, despite the proliferation of comprehensive legislative frameworks developed to address and prevent cultural property crimes. Online platforms, especially social media and e-commerce, have facilitated illegal trade and pose significant challenges for law enforcement agencies. To address this issue, the European project SIGNIFICANCE was born, with the aim of combating illicit traffic of Cultural Heritage (CH) goods. This paper presents the outcomes of the project, introducing a user-friendly platform that employs Artificial Intelligence (AI) and Deep learning (DL) to prevent and combat illicit activities. The platform enables authorities to identify, track, and block illegal activities in the online domain, thereby aiding successful prosecutions of criminal networks. Moreover, it incorporates an ontology-based approach, providing comprehensive information on the cultural significance, provenance, and legal status of identified artefacts. This enables users to access valuable contextual information during the scraping and classification phases, facilitating informed decision-making and targeted actions. To accomplish these objectives, computationally intensive tasks are executed on the HPC CyClone infrastructure, optimizing computing resources, time, and cost efficiency. Notably, the infrastructure supports algorithm modelling and training, as well as web, dark web and social media scraping and data classification. Preliminary results indicate a 10-15% increase in the identification of illicit artifacts, demonstrating the platform's effectiveness in enhancing law enforcement capabilities.
Collapse
Affiliation(s)
- Eva Savina Malinverni
- Dipartimento di Ingegneria Civile, Edile e dell'Architettura (DICEA), Università Politecnica delle Marche, Via Brecce Bianche 12, 60131, Ancona, Italy
| | - Dante Abate
- Eratosthenes Center of Excellence, Limassol, 3012, Cyprus
| | - Antonia Agapiou
- The Cyprus Institute (CyI), Athalassa Campus, Nicosia, Cyprus
| | - Francesco Di Stefano
- Dipartimento di Ingegneria Civile, Edile e dell'Architettura (DICEA), Università Politecnica delle Marche, Via Brecce Bianche 12, 60131, Ancona, Italy
| | - Andrea Felicetti
- VRAI - Vision Robotics and Artificial Intelligence Lab, Dipartimento di Ingegneria dell'Informazione (DII), Università Politecnica delle Marche, 60131, Ancona, Italy
| | - Marina Paolanti
- Department of Political Sciences, Communication and International Relations, University of Macerata, 62100, Macerata, Italy.
| | - Roberto Pierdicca
- Dipartimento di Ingegneria Civile, Edile e dell'Architettura (DICEA), Università Politecnica delle Marche, Via Brecce Bianche 12, 60131, Ancona, Italy
| | - Primo Zingaretti
- VRAI - Vision Robotics and Artificial Intelligence Lab, Dipartimento di Ingegneria dell'Informazione (DII), Università Politecnica delle Marche, 60131, Ancona, Italy
| |
Collapse
|
155
|
Oku T, Furuya S, Lee A, Altenmüller E. Video-based diagnosis support system for pianists with Musician's dystonia. Front Neurol 2024; 15:1409962. [PMID: 39015318 PMCID: PMC11250081 DOI: 10.3389/fneur.2024.1409962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Accepted: 06/18/2024] [Indexed: 07/18/2024] Open
Abstract
Background Musician's dystonia is a task-specific movement disorder that deteriorates fine motor control of skilled movements in musical performance. Although this disorder threatens professional careers, its diagnosis is challenging for clinicians who have no specialized knowledge of musical performance. Objectives To support diagnostic evaluation, the present study proposes a novel approach using a machine learning-based algorithm to identify the symptomatic movements of Musician's dystonia. Methods We propose an algorithm that identifies the dystonic movements using the anomaly detection method with an autoencoder trained with the hand kinematics of healthy pianists. A unique feature of the algorithm is that it requires only the video image of the hand, which can be derived by a commercially available camera. We also measured the hand biomechanical functions to assess the contribution of peripheral factors and improve the identification of dystonic symptoms. Results The proposed algorithm successfully identified Musician's dystonia with an accuracy and specificity of 90% based only on video footages of the hands. In addition, we identified the degradation of biomechanical functions involved in controlling multiple fingers, which is not specific to musical performance. By contrast, there were no dystonia-specific malfunctions of hand biomechanics, including the strength and agility of individual digits. Conclusion These findings demonstrate the effectiveness of the present technique in aiding in the accurate diagnosis of Musician's dystonia.
Collapse
Affiliation(s)
- Takanori Oku
- College of Engineering and Design, Shibaura Institute of Technology, Tokyo, Japan
- Sony Computer Science Laboratories, Inc., Tokyo, Japan
- NeuroPiano Institute, Kyoto, Japan
| | - Shinichi Furuya
- Sony Computer Science Laboratories, Inc., Tokyo, Japan
- NeuroPiano Institute, Kyoto, Japan
- Institute of Music Physiology and Musicians’ Medicine, University of Music, Drama and Media, Hanover, Germany
| | - André Lee
- Institute of Music Physiology and Musicians’ Medicine, University of Music, Drama and Media, Hanover, Germany
- Department of Neurology, Klinikum rechts der Isar, Technical University of Munich, München, Germany
| | - Eckart Altenmüller
- Institute of Music Physiology and Musicians’ Medicine, University of Music, Drama and Media, Hanover, Germany
| |
Collapse
|
156
|
Tian Y, Deng N, Xu J, Wen Z. A fine-grained dataset for sewage outfalls objective detection in natural environments. Sci Data 2024; 11:724. [PMID: 38956054 PMCID: PMC11219831 DOI: 10.1038/s41597-024-03574-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Accepted: 06/24/2024] [Indexed: 07/04/2024] Open
Abstract
Pollution sources release contaminants into water bodies via sewage outfalls (SOs). Using high-resolution images to interpret SOs is laborious and expensive because it needs specific knowledge and must be done by hand. Integrating unmanned aerial vehicles (UAVs) and deep learning technology could assist in constructing an automated effluent SOs detection tool by gaining specialized knowledge. Achieving this objective requires high-quality image datasets for model training and testing. However, there is no satisfactory dataset of SOs. This study presents a high-quality dataset named the images for sewage outfalls objective detection (iSOOD). The 10481 images in iSOOD were captured using UAVs and handheld cameras by individuals from the river basin in China. This study has carefully annotated these images to ensure accuracy and consistency. The iSOOD has undergone technical validation utilizing the YOLOv10 series objective detection model. Our study could provide high-quality SOs datasets for enhancing deep-learning models with UAVs to achieve efficient and intelligent river basin management.
Collapse
Affiliation(s)
- Yuqing Tian
- School of Environment, Tsinghua University, Beijing, 100084, PR China
| | - Ning Deng
- School of Environment, Tsinghua University, Beijing, 100084, PR China
| | - Jie Xu
- Changjiang Basin Ecology and Environment Monitoring and Scientific Research Center, Changjiang Basin Ecology and Environment Administration, Ministry of Ecology and Environment, Wuhan, 430010, China.
| | - Zongguo Wen
- School of Environment, Tsinghua University, Beijing, 100084, PR China.
| |
Collapse
|
157
|
Vardhan M, Tanade C, Chen SJ, Mahmood O, Chakravartti J, Jones WS, Kahn AM, Vemulapalli S, Patel M, Leopold JA, Randles A. Diagnostic Performance of Coronary Angiography Derived Computational Fractional Flow Reserve. J Am Heart Assoc 2024; 13:e029941. [PMID: 38904250 PMCID: PMC11255717 DOI: 10.1161/jaha.123.029941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 04/18/2024] [Indexed: 06/22/2024]
Abstract
BACKGROUND Computational fluid dynamics can compute fractional flow reserve (FFR) accurately. However, existing models are limited by either the intravascular hemodynamic phenomarkers that can be captured or the fidelity of geometries that can be modeled. METHODS AND RESULTS This study aimed to validate a new coronary angiography-based FFR framework, FFRHARVEY, and examine intravascular hemodynamics to identify new biomarkers that could augment FFR in discerning unrevascularized patients requiring intervention. A 2-center cohort was used to examine diagnostic performance of FFRHARVEY compared with reference wire-based FFR (FFRINVASIVE). Additional biomarkers, longitudinal vorticity, velocity, and wall shear stress, were evaluated for their ability to augment FFR and indicate major adverse cardiac events. A total of 160 patients with 166 lesions were investigated. FFRHARVEY was compared with FFRINVASIVE by investigators blinded to the invasive FFR results with a per-stenosis area under the curve of 0.91, positive predictive value of 90.2%, negative predictive value of 89.6%, sensitivity of 79.3%, and specificity of 95.4%. The percentage ofdiscrepancy for continuous values of FFR was 6.63%. We identified a hemodynamic phenomarker, longitudinal vorticity, as a metric indicative of major adverse cardiac events in unrevascularized gray-zone cases. CONCLUSIONS FFRHARVEY had high performance (area under the curve: 0.91, positive predictive value: 90.2%, negative predictive value: 89.6%) compared with FFRINVASIVE. The proposed framework provides a robust and accurate way to compute a complete set of intravascular phenomarkers, in which longitudinal vorticity was specifically shown to differentiate vessels predisposed to major adverse cardiac events.
Collapse
Affiliation(s)
| | - Cyrus Tanade
- Department of BiomedicalDuke UniversityDurhamNCUSA
| | - S. James Chen
- Department of MedicineUniversity of ColoradoAuroraCOUSA
| | | | | | | | - Andrew M. Kahn
- Division of Cardiovascular MedicineUniversity of California San DiegoLa JollaCAUSA
| | | | - Manesh Patel
- Department of BiomedicalDuke UniversityDurhamNCUSA
| | - Jane A. Leopold
- Division of Cardiovascular MedicineBrigham and Women’s HospitalBostonMAUSA
| | | |
Collapse
|
158
|
García-Ruiz P, Romero-Ramirez FJ, Muñoz-Salinas R, Marín-Jiménez MJ, Medina-Carnicer R. Large-Scale Indoor Camera Positioning Using Fiducial Markers. SENSORS (BASEL, SWITZERLAND) 2024; 24:4303. [PMID: 39001083 PMCID: PMC11244017 DOI: 10.3390/s24134303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 06/27/2024] [Accepted: 06/30/2024] [Indexed: 07/16/2024]
Abstract
Estimating the pose of a large set of fixed indoor cameras is a requirement for certain applications in augmented reality, autonomous navigation, video surveillance, and logistics. However, accurately mapping the positions of these cameras remains an unsolved problem. While providing partial solutions, existing alternatives are limited by their dependence on distinct environmental features, the requirement for large overlapping camera views, and specific conditions. This paper introduces a novel approach to estimating the pose of a large set of cameras using a small subset of fiducial markers printed on regular pieces of paper. By placing the markers in areas visible to multiple cameras, we can obtain an initial estimation of the pair-wise spatial relationship between them. The markers can be moved throughout the environment to obtain the relationship between all cameras, thus creating a graph connecting all cameras. In the final step, our method performs a full optimization, minimizing the reprojection errors of the observed markers and enforcing physical constraints, such as camera and marker coplanarity and control points. We validated our approach using novel artificial and real datasets with varying levels of complexity. Our experiments demonstrated superior performance over existing state-of-the-art techniques and increased effectiveness in real-world applications. Accompanying this paper, we provide the research community with access to our code, tutorials, and an application framework to support the deployment of our methodology.
Collapse
Affiliation(s)
- Pablo García-Ruiz
- Departamento de Informática y Análisis Numérico, Edificio Einstein, Campus de Rabanales, Universidad de Coŕdoba, 14071 Córdoba, Spain; (P.G.-R.); (R.M.-C.)
| | - Francisco J. Romero-Ramirez
- Departamento de Teoría de la Señal y Comunicaciones y Sistemas Telemáticos y Computación, Campus de Fuenlabrada, Universidad Rey Juan Carlos, 28942 Fuenlabrada, Spain;
| | - Rafael Muñoz-Salinas
- Departamento de Informática y Análisis Numérico, Edificio Einstein, Campus de Rabanales, Universidad de Coŕdoba, 14071 Córdoba, Spain; (P.G.-R.); (R.M.-C.)
- Instituto Maimónides de Investigación en Biomedicina (IMIBIC), Avenida Menéndez Pidal s/n, 14004 Córdoba, Spain
| | - Manuel J. Marín-Jiménez
- Departamento de Informática y Análisis Numérico, Edificio Einstein, Campus de Rabanales, Universidad de Coŕdoba, 14071 Córdoba, Spain; (P.G.-R.); (R.M.-C.)
- Instituto Maimónides de Investigación en Biomedicina (IMIBIC), Avenida Menéndez Pidal s/n, 14004 Córdoba, Spain
| | - Rafael Medina-Carnicer
- Departamento de Informática y Análisis Numérico, Edificio Einstein, Campus de Rabanales, Universidad de Coŕdoba, 14071 Córdoba, Spain; (P.G.-R.); (R.M.-C.)
- Instituto Maimónides de Investigación en Biomedicina (IMIBIC), Avenida Menéndez Pidal s/n, 14004 Córdoba, Spain
| |
Collapse
|
159
|
Gao Y, Zhang J, Zou C, Bi L, Huang C, Nie J, Yan Y, Yu X, Zhang F, Yao F, Ding L. A method for calculating vector forces at human-mattress interface during sleeping positions utilizing image registration. Sci Rep 2024; 14:15238. [PMID: 38956282 PMCID: PMC11220148 DOI: 10.1038/s41598-024-66035-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 06/26/2024] [Indexed: 07/04/2024] Open
Abstract
The vector forces at the human-mattress interface are not only crucial for understanding the distribution of vertical and shear forces exerted on the human body during sleep but also serves as a significant input for biomechanical models of sleeping positions, whose accuracy determines the credibility of predicting musculoskeletal system loads. In this study, we introduce a novel method for calculating the interface vector forces. By recording indentations after supine and lateral positions using a vacuum mattress and 3D scanner, we utilize image registration techniques to align body pressure distribution with the mattress deformation scanning images, thereby calculating the vector force values for each unit area (36.25 mm × 36.25 mm). This method was validated through five participants attendance from two perspectives, revealing that (1) the mean summation of the vertical force components is 98.67% ± 7.21% body weight, exhibiting good consistency, and mean ratio of horizontal component force to body weight is 2.18% ± 1.77%. (2) the predicted muscle activity using the vector forces as input to the sleep position model aligns with the measured muscle activity (%MVC), with correlation coefficient over 0.7. The proposed method contributes to the vector force distribution understanding and the analysis of musculoskeletal loads during sleep, providing valuable insights for mattress design and evaluation.
Collapse
Affiliation(s)
- Ying Gao
- Beijing Advanced Innovation Center for Biomedical Engineering, Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, School of Biological Science and Medical Engineering, Beihang University, Beijing, 100191, China
| | - Jing Zhang
- Beijing Advanced Innovation Center for Biomedical Engineering, Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, School of Biological Science and Medical Engineering, Beihang University, Beijing, 100191, China
| | - Chengzhao Zou
- Beijing Advanced Innovation Center for Biomedical Engineering, Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, School of Biological Science and Medical Engineering, Beihang University, Beijing, 100191, China
| | - Liwen Bi
- Beijing Advanced Innovation Center for Biomedical Engineering, Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, School of Biological Science and Medical Engineering, Beihang University, Beijing, 100191, China
| | - Chengzhen Huang
- Beijing Advanced Innovation Center for Biomedical Engineering, Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, School of Biological Science and Medical Engineering, Beihang University, Beijing, 100191, China
| | - Jiachen Nie
- Beijing Advanced Innovation Center for Biomedical Engineering, Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, School of Biological Science and Medical Engineering, Beihang University, Beijing, 100191, China
| | - Yongli Yan
- Beijing Advanced Innovation Center for Biomedical Engineering, Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, School of Biological Science and Medical Engineering, Beihang University, Beijing, 100191, China
| | - Xinli Yu
- Beijing Advanced Innovation Center for Biomedical Engineering, Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, School of Biological Science and Medical Engineering, Beihang University, Beijing, 100191, China
| | - Fujun Zhang
- De Rucci Healthy Sleep Co., Ltd, Dongguan, 523960, Guangdong, China
| | - Fanglai Yao
- De Rucci Healthy Sleep Co., Ltd, Dongguan, 523960, Guangdong, China
| | - Li Ding
- Beijing Advanced Innovation Center for Biomedical Engineering, Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, School of Biological Science and Medical Engineering, Beihang University, Beijing, 100191, China.
| |
Collapse
|
160
|
Fujimoto K, Ashida H. Influence of scene aspect ratio and depth cues on verticality perception bias. J Vis 2024; 24:12. [PMID: 39028900 DOI: 10.1167/jov.24.7.12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/21/2024] Open
Abstract
Perceiving verticality is crucial for accurate spatial orientation. Previous research has revealed that tilted scenes can bias verticality perception. Verticality perception bias can be represented as the sum of multiple periodic functions that play a role in the perception of visual orientation, where the specific factors affecting each periodicity remain uncertain. This study investigated the influence of the width and depth of an indoor scene on each periodic component of the bias. The participants were presented with an indoor scene showing a rectangular checkerboard room (Experiment 1), a rectangular aperture on the wall (Experiment 2), or a rectangular dotted room (Experiment 3), with various aspect ratios. The stimuli were presented with roll orientations ranging from 90° clockwise to 90° counterclockwise. The participants were asked to report their subjective visual vertical (SVV) perceptions. The contributions of 45°, 90°, and 180° periodicities to the SVV error were assessed by the weighted vector sum model. In Experiment 1, the periodic components of the SVV error increased with the aspect ratio. In Experiments 2 and 3, only the 90° component increased with the aspect ratio. These findings suggest that extended transverse surfaces may modulate the periodic components of verticality perception.
Collapse
Affiliation(s)
- Kanon Fujimoto
- Department of Psychology, Graduate School of Letters, Kyoto University, Japan
| | - Hiroshi Ashida
- Department of Psychology, Graduate School of Letters, Kyoto University, Japan
| |
Collapse
|
161
|
Ji H, Han P, Li J, Liu X, Liu L. Transformer Discharge Carbon-Trace Detection Based on Improved MSRCR Image-Enhancement Algorithm and YOLOv8 Model. SENSORS (BASEL, SWITZERLAND) 2024; 24:4309. [PMID: 39001089 PMCID: PMC11244463 DOI: 10.3390/s24134309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/25/2024] [Revised: 06/17/2024] [Accepted: 06/27/2024] [Indexed: 07/16/2024]
Abstract
It is difficult to visually detect internal defects in a large transformer with a metal closure. For convenient internal inspection, a micro-robot was adopted, and an inspection method based on an image-enhancement algorithm and an improved deep-learning network was proposed in this paper. Considering the dim environment inside the transformer and the problems of irregular imaging distance and fluctuating supplementary light conditions during image acquisition with the internal-inspection robot, an improved MSRCR algorithm for image enhancement was proposed. It could analyze the local contrast of the image and enhance the details on multiple scales. At the same time, a white-balance algorithm was introduced to enhance the contrast and brightness and solve the problems of overexposure and color distortion. To improve the target recognition performance of complex carbon-trace defects, the SimAM mechanism was incorporated into the Backbone network of the YOLOv8 model to enhance the extraction of carbon-trace features. Meanwhile, the DyHead dynamic detection Head framework was constructed at the output of the YOLOv8 model to improve the perception of local carbon traces with different sizes. To improve the defect target recognition speed of the transformer-inspection robot, a pruning operation was carried out on the YOLOv8 model to remove redundant parameters, realize model lightness, and improve detection efficiency. To verify the effectiveness of the improved algorithm, the detection model was trained and validated with the carbon-trace dataset. The results showed that the MSH-YOLOv8 algorithm achieved an accuracy of 91.80%, which was 3.4 percentage points higher compared to the original YOLOv8 algorithm, and had a significant advantage over other mainstream target-detection algorithms. Meanwhile, the FPS of the proposed algorithm was up to 99.2, indicating that the model computation and model complexity were successfully reduced, which meets the requirements for engineering applications of the transformer internal-inspection robot.
Collapse
Affiliation(s)
- Hongxin Ji
- School of Electrical Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Peilin Han
- School of Electrical Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Jiaqi Li
- School of Electrical Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Xinghua Liu
- College of Mechanical and Electronic Engineering, Shandong Agricultural University, Tai'an 271018, China
| | - Liqing Liu
- State Grid Tianjin Electric Power Research Institute, Tianjin 300180, China
| |
Collapse
|
162
|
Lin J, Yin H, Wu Y, Luo J, Ye Q, Zhou B, Xie M, Ye C, Liang J, Li X, Bin W, Yang Z. Stitching method for panoramic nail fold images based on capillary contour enhancement. JOURNAL OF BIOPHOTONICS 2024:e202400105. [PMID: 38955359 DOI: 10.1002/jbio.202400105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2024] [Revised: 05/21/2024] [Accepted: 06/12/2024] [Indexed: 07/04/2024]
Abstract
Nail fold capillaroscopy is an important means of monitoring human health. Panoramic nail fold images improve the efficiency and accuracy of examinations. However, the acquisition of panoramic nail fold images is seldom studied and the problem manifests of few matching feature points when image stitching is used for such images. Therefore, this paper presents a method for panoramic nail fold image stitching based on vascular contour enhancement, which first solves the problem of few matching feature points by pre-processing the image with contrast-constrained adaptive histogram equalization (CLAHE), bilateral filtering (BF), and sharpening algorithms. The panoramic images of the nail fold blood vessels are then successfully stitched using the fast robust feature (SURF), fast library of approximate nearest neighbors (FLANN) and random sample agreement (RANSAC) algorithms. The experimental results show that the panoramic image stitched by this paper's algorithm has a field of view width of 7.43 mm, which improves the efficiency and accuracy of diagnosis.
Collapse
Affiliation(s)
- Jianan Lin
- School of Physics and Optoelectronic Engineering, Foshan University, Foshan, China
| | - Hao Yin
- School of Physics and Optoelectronic Engineering, Foshan University, Foshan, China
| | - Yanxiong Wu
- School of Physics and Optoelectronic Engineering, Foshan University, Foshan, China
- Ji Hua Laboratory, Foshan, Guangdong, China
| | - Jiaxiong Luo
- School of Physics and Optoelectronic Engineering, Foshan University, Foshan, China
| | - Qianyao Ye
- School of Physics and Optoelectronic Engineering, Foshan University, Foshan, China
| | - Bin Zhou
- School of Physics and Optoelectronic Engineering, Foshan University, Foshan, China
| | - Mugui Xie
- School of Physics and Optoelectronic Engineering, Foshan University, Foshan, China
| | - Cong Ye
- School of Physics and Optoelectronic Engineering, Foshan University, Foshan, China
| | - Junzhao Liang
- School of Physics and Optoelectronic Engineering, Foshan University, Foshan, China
| | - Xiaosong Li
- School of Physics and Optoelectronic Engineering, Foshan University, Foshan, China
| | - Wei Bin
- State Key Laboratory of Traditional Chinese Medicine Syndrome/Health Construction Center, The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, GuangZhou, China
| | - Zhimin Yang
- State Key Laboratory of Traditional Chinese Medicine Syndrome/Health Construction Center, The Second Affiliated Hospital of Guangzhou University of Chinese Medicine, GuangZhou, China
| |
Collapse
|
163
|
Matloob Abbasi M, Iqbal S, Aurangzeb K, Alhussein M, Khan TM. LMBiS-Net: A lightweight bidirectional skip connection based multipath CNN for retinal blood vessel segmentation. Sci Rep 2024; 14:15219. [PMID: 38956117 PMCID: PMC11219784 DOI: 10.1038/s41598-024-63496-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 05/29/2024] [Indexed: 07/04/2024] Open
Abstract
Blinding eye diseases are often related to changes in retinal structure, which can be detected by analysing retinal blood vessels in fundus images. However, existing techniques struggle to accurately segment these delicate vessels. Although deep learning has shown promise in medical image segmentation, its reliance on specific operations can limit its ability to capture crucial details such as the edges of the vessel. This paper introduces LMBiS-Net, a lightweight convolutional neural network designed for the segmentation of retinal vessels. LMBiS-Net achieves exceptional performance with a remarkably low number of learnable parameters (only 0.172 million). The network used multipath feature extraction blocks and incorporates bidirectional skip connections for the information flow between the encoder and decoder. In addition, we have optimised the efficiency of the model by carefully selecting the number of filters to avoid filter overlap. This optimisation significantly reduces training time and improves computational efficiency. To assess LMBiS-Net's robustness and ability to generalise to unseen data, we conducted comprehensive evaluations on four publicly available datasets: DRIVE, STARE, CHASE_DB1, and HRF The proposed LMBiS-Net achieves significant performance metrics in various datasets. It obtains sensitivity values of 83.60%, 84.37%, 86.05%, and 83.48%, specificity values of 98.83%, 98.77%, 98.96%, and 98.77%, accuracy (acc) scores of 97.08%, 97.69%, 97.75%, and 96.90%, and AUC values of 98.80%, 98.82%, 98.71%, and 88.77% on the DRIVE, STARE, CHEASE_DB, and HRF datasets, respectively. In addition, it records F1 scores of 83.43%, 84.44%, 83.54%, and 78.73% on the same datasets. Our evaluations demonstrate that LMBiS-Net achieves high segmentation accuracy (acc) while exhibiting both robustness and generalisability across various retinal image datasets. This combination of qualities makes LMBiS-Net a promising tool for various clinical applications.
Collapse
Affiliation(s)
- Mufassir Matloob Abbasi
- Department of Electrical Engineering, Abasyn University Islamabad Campus (AUIC), Islamabad, 44000, Pakistan
| | - Shahzaib Iqbal
- Department of Electrical Engineering, Abasyn University Islamabad Campus (AUIC), Islamabad, 44000, Pakistan.
| | - Khursheed Aurangzeb
- Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, P. O. Box 51178, 11543, Saudi Arabia
| | - Musaed Alhussein
- Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, P. O. Box 51178, 11543, Saudi Arabia
| | - Tariq M Khan
- School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, Australia
| |
Collapse
|
164
|
Kang Q, Lao Q, Gao J, Liu J, Yi H, Ma B, Zhang X, Li K. Deblurring masked image modeling for ultrasound image analysis. Med Image Anal 2024; 97:103256. [PMID: 39047605 DOI: 10.1016/j.media.2024.103256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 03/19/2024] [Accepted: 06/24/2024] [Indexed: 07/27/2024]
Abstract
Recently, large pretrained vision foundation models based on masked image modeling (MIM) have attracted unprecedented attention and achieved remarkable performance across various tasks. However, the study of MIM for ultrasound imaging remains relatively unexplored, and most importantly, current MIM approaches fail to account for the gap between natural images and ultrasound, as well as the intrinsic imaging characteristics of the ultrasound modality, such as the high noise-to-signal ratio. In this paper, motivated by the unique high noise-to-signal ratio property in ultrasound, we propose a deblurring MIM approach specialized to ultrasound, which incorporates a deblurring task into the pretraining proxy task. The incorporation of deblurring facilitates the pretraining to better recover the subtle details within ultrasound images that are vital for subsequent downstream analysis. Furthermore, we employ a multi-scale hierarchical encoder to extract both local and global contextual cues for improved performance, especially on pixel-wise tasks such as segmentation. We conduct extensive experiments involving 280,000 ultrasound images for the pretraining and evaluate the downstream transfer performance of the pretrained model on various disease diagnoses (nodule, Hashimoto's thyroiditis) and task types (classification, segmentation). The experimental results demonstrate the efficacy of the proposed deblurring MIM, achieving state-of-the-art performance across a wide range of downstream tasks and datasets. Overall, our work highlights the potential of deblurring MIM for ultrasound image analysis, presenting an ultrasound-specific vision foundation model.
Collapse
Affiliation(s)
- Qingbo Kang
- Department of Ultrasonography, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China; West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China; Shanghai Artificial Intelligence Laboratory, Shanghai, 200030, China
| | - Qicheng Lao
- School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, 100876, China; Shanghai Artificial Intelligence Laboratory, Shanghai, 200030, China.
| | - Jun Gao
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China; College of Computer Science, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Jingyan Liu
- Department of Ultrasonography, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Huahui Yi
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Buyun Ma
- Department of Ultrasonography, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China
| | - Xiaofan Zhang
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200030, China; Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Kang Li
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan, 610041, China; Shanghai Artificial Intelligence Laboratory, Shanghai, 200030, China.
| |
Collapse
|
165
|
Nie T, Zhao Y, Yao S. ELA-Net: An Efficient Lightweight Attention Network for Skin Lesion Segmentation. SENSORS (BASEL, SWITZERLAND) 2024; 24:4302. [PMID: 39001081 PMCID: PMC11243870 DOI: 10.3390/s24134302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 06/25/2024] [Accepted: 06/27/2024] [Indexed: 07/16/2024]
Abstract
In clinical conditions limited by equipment, attaining lightweight skin lesion segmentation is pivotal as it facilitates the integration of the model into diverse medical devices, thereby enhancing operational efficiency. However, the lightweight design of the model may face accuracy degradation, especially when dealing with complex images such as skin lesion images with irregular regions, blurred boundaries, and oversized boundaries. To address these challenges, we propose an efficient lightweight attention network (ELANet) for the skin lesion segmentation task. In ELANet, two different attention mechanisms of the bilateral residual module (BRM) can achieve complementary information, which enhances the sensitivity to features in spatial and channel dimensions, respectively, and then multiple BRMs are stacked for efficient feature extraction of the input information. In addition, the network acquires global information and improves segmentation accuracy by putting feature maps of different scales through multi-scale attention fusion (MAF) operations. Finally, we evaluate the performance of ELANet on three publicly available datasets, ISIC2016, ISIC2017, and ISIC2018, and the experimental results show that our algorithm can achieve 89.87%, 81.85%, and 82.87% of the mIoU on the three datasets with a parametric of 0.459 M, which is an excellent balance between accuracy and lightness and is superior to many existing segmentation methods.
Collapse
Affiliation(s)
- Tianyu Nie
- School of Geography and Information Engineering, China University of Geosciences, Wuhan 430074, China
| | - Yishi Zhao
- School of Computer Science, China University of Geosciences, Wuhan 430074, China
- Engineering Research Center of Natural Resource Information Management and Digital Twin Engineering Software, Ministry of Education, Wuhan 430074, China
| | - Shihong Yao
- School of Geography and Information Engineering, China University of Geosciences, Wuhan 430074, China
| |
Collapse
|
166
|
Chiu CH, Chen YJ, Wu Y, Shi Y, Ho TY. Achieve fairness without demographics for dermatological disease diagnosis. Med Image Anal 2024; 95:103188. [PMID: 38718715 DOI: 10.1016/j.media.2024.103188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 04/29/2024] [Accepted: 04/29/2024] [Indexed: 06/01/2024]
Abstract
In medical image diagnosis, fairness has become increasingly crucial. Without bias mitigation, deploying unfair AI would harm the interests of the underprivileged population and potentially tear society apart. Recent research addresses prediction biases in deep learning models concerning demographic groups (e.g., gender, age, and race) by utilizing demographic (sensitive attribute) information during training. However, many sensitive attributes naturally exist in dermatological disease images. If the trained model only targets fairness for a specific attribute, it remains unfair for other attributes. Moreover, training a model that can accommodate multiple sensitive attributes is impractical due to privacy concerns. To overcome this, we propose a method enabling fair predictions for sensitive attributes during the testing phase without using such information during training. Inspired by prior work highlighting the impact of feature entanglement on fairness, we enhance the model features by capturing the features related to the sensitive and target attributes and regularizing the feature entanglement between corresponding classes. This ensures that the model can only classify based on the features related to the target attribute without relying on features associated with sensitive attributes, thereby improving fairness and accuracy. Additionally, we use disease masks from the Segment Anything Model (SAM) to enhance the quality of the learned feature. Experimental results demonstrate that the proposed method can improve fairness in classification compared to state-of-the-art methods in two dermatological disease datasets.
Collapse
Affiliation(s)
- Ching-Hao Chiu
- National Tsing Hua University, No. 101 , Section 2, Kuang-Fu Road, Hsinchu, Taiwan.
| | - Yu-Jen Chen
- National Tsing Hua University, No. 101 , Section 2, Kuang-Fu Road, Hsinchu, Taiwan
| | - Yawen Wu
- University of Notre Dame, Holy Cross Dr, Notre Dame, IN, USA
| | - Yiyu Shi
- University of Notre Dame, Holy Cross Dr, Notre Dame, IN, USA
| | - Tsung-Yi Ho
- The Chinese University of Hong Kong, Shatin, NT, Hong Kong
| |
Collapse
|
167
|
Fu J, Yang Z, Melemenidis S, Viswanathan V, Dutt S, Manjappa R, Lau B, Soto LA, Ashraf MR, Skinner L, Yu SJ, Surucu M, Casey KM, Rankin EB, Graves E, Lu W, Loo BW, Gu X. Exploring Deep Learning for Estimating the Isoeffective Dose of FLASH Irradiation From Mouse Intestinal Histological Images. Int J Radiat Oncol Biol Phys 2024; 119:1001-1010. [PMID: 38171387 DOI: 10.1016/j.ijrobp.2023.12.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 12/09/2023] [Accepted: 12/23/2023] [Indexed: 01/05/2024]
Abstract
PURPOSE Ultrahigh-dose-rate (FLASH) irradiation has been reported to reduce normal tissue damage compared with conventional dose rate (CONV) irradiation without compromising tumor control. This proof-of-concept study aims to develop a deep learning (DL) approach to quantify the FLASH isoeffective dose (dose of CONV that would be required to produce the same effect as the given physical FLASH dose) with postirradiation mouse intestinal histology images. METHODS AND MATERIALS Eighty-four healthy C57BL/6J female mice underwent 16 MeV electron CONV (0.12 Gy/s; n = 41) or FLASH (200 Gy/s; n = 43) single fraction whole abdominal irradiation. Physical dose ranged from 12 to 16 Gy for FLASH and 11 to 15 Gy for CONV in 1 Gy increments. Four days after irradiation, 9 jejunum cross-sections from each mouse were hematoxylin and eosin stained and digitized for histological analysis. CONV data set was randomly split into training (n = 33) and testing (n = 8) data sets. ResNet101-based DL models were retrained using the CONV training data set to estimate the dose based on histological features. The classical manual crypt counting (CC) approach was implemented for model comparison. Cross-section-wise mean squared error was computed to evaluate the dose estimation accuracy of both approaches. The validated DL model was applied to the FLASH data set to map the physical FLASH dose into the isoeffective dose. RESULTS The DL model achieved a cross-section-wise mean squared error of 0.20 Gy2 on the CONV testing data set compared with 0.40 Gy2 of the CC approach. Isoeffective doses estimated by the DL model for FLASH doses of 12, 13, 14, 15, and 16 Gy were 12.19 ± 0.46, 12.54 ± 0.37, 12.69 ± 0.26, 12.84 ± 0.26, and 13.03 ± 0.28 Gy, respectively. CONCLUSIONS Our proposed DL model achieved accurate CONV dose estimation. The DL model results indicate that in the physical dose range of 13 to 16 Gy, the biologic dose response of small intestinal tissue to FLASH irradiation is represented by a lower isoeffective dose compared with the physical dose. Our DL approach can be a tool for studying isoeffective doses of other radiation dose modifying interventions.
Collapse
Affiliation(s)
- Jie Fu
- Department of Radiation Oncology, University of Washington, Seattle, Washington; Department of Radiation Oncology, Stanford University School of Medicine, Stanford, California
| | - Zi Yang
- Department of Radiation Oncology, Stanford University School of Medicine, Stanford, California; Medical Artificial Intelligence and Automation (MAIA) Laboratory, Department of Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, Texas
| | - Stavros Melemenidis
- Department of Radiation Oncology, Stanford University School of Medicine, Stanford, California
| | - Vignesh Viswanathan
- Department of Radiation Oncology, Stanford University School of Medicine, Stanford, California
| | - Suparna Dutt
- Department of Radiation Oncology, Stanford University School of Medicine, Stanford, California
| | - Rakesh Manjappa
- Department of Radiation Oncology, Stanford University School of Medicine, Stanford, California
| | - Brianna Lau
- Department of Radiation Oncology, Stanford University School of Medicine, Stanford, California
| | - Luis A Soto
- Department of Radiation Oncology, Stanford University School of Medicine, Stanford, California
| | - M Ramish Ashraf
- Department of Radiation Oncology, Stanford University School of Medicine, Stanford, California
| | - Lawrie Skinner
- Department of Radiation Oncology, Stanford University School of Medicine, Stanford, California
| | - Shu-Jung Yu
- Department of Radiation Oncology, Stanford University School of Medicine, Stanford, California
| | - Murat Surucu
- Department of Radiation Oncology, Stanford University School of Medicine, Stanford, California
| | - Kerriann M Casey
- Department of Comparative Medicine, Stanford University School of Medicine, Stanford, California
| | - Erinn B Rankin
- Department of Radiation Oncology, Stanford University School of Medicine, Stanford, California
| | - Edward Graves
- Department of Radiation Oncology, Stanford University School of Medicine, Stanford, California
| | - Weiguo Lu
- Medical Artificial Intelligence and Automation (MAIA) Laboratory, Department of Radiation Oncology, University of Texas Southwestern Medical Center, Dallas, Texas
| | - Billy W Loo
- Department of Radiation Oncology, Stanford University School of Medicine, Stanford, California.
| | - Xuejun Gu
- Department of Radiation Oncology, Stanford University School of Medicine, Stanford, California.
| |
Collapse
|
168
|
Li X, Chi X, Huang P, Liang Q, Liu J. Deep neural network for the prediction of KRAS, NRAS, and BRAF genotypes in left-sided colorectal cancer based on histopathologic images. Comput Med Imaging Graph 2024; 115:102384. [PMID: 38759471 DOI: 10.1016/j.compmedimag.2024.102384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 04/14/2024] [Accepted: 04/14/2024] [Indexed: 05/19/2024]
Abstract
BACKGROUND The KRAS, NRAS, and BRAF genotypes are critical for selecting targeted therapies for patients with metastatic colorectal cancer (mCRC). Here, we aimed to develop a deep learning model that utilizes pathologic whole-slide images (WSIs) to accurately predict the status of KRAS, NRAS, and BRAFV600E. METHODS 129 patients with left-sided colon cancer and rectal cancer from the Third Affiliated Hospital of Sun Yat-sen University were assigned to the training and testing cohorts. Utilizing three convolutional neural networks (ResNet18, ResNet50, and Inception v3), we extracted 206 pathological features from H&E-stained WSIs, serving as the foundation for constructing specific pathological models. A clinical feature model was then developed, with carcinoembryonic antigen (CEA) identified through comprehensive multiple regression analysis as the key biomarker. Subsequently, these two models were combined to create a clinical-pathological integrated model, resulting in a total of three genetic prediction models. RESULT 103 patients were evaluated in the training cohort (1782,302 image tiles), while the remaining 26 patients were enrolled in the testing cohort (489,481 image tiles). Compared with the clinical model and the pathology model, the combined model which incorporated CEA levels and pathological signatures, showed increased predictive ability, with an area under the curve (AUC) of 0.96 in the training and an AUC of 0.83 in the testing cohort, accompanied by a high positive predictive value (PPV 0.92). CONCLUSION The combined model demonstrated a considerable ability to accurately predict the status of KRAS, NRAS, and BRAFV600E in patients with left-sided colorectal cancer, with potential application to assist doctors in developing targeted treatment strategies for mCRC patients, and effectively identifying mutations and eliminating the need for confirmatory genetic testing.
Collapse
Affiliation(s)
- Xuejie Li
- Department of Gastrointestinal Surgery, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, Guangdong, PR China
| | - Xianda Chi
- Department of Gastrointestinal Surgery, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, Guangdong, PR China
| | - Pinjie Huang
- Department of Anaesthesia, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, Guangdong, PR China
| | - Qiong Liang
- Department of Pathology, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, Guangdong, PR China.
| | - Jianpei Liu
- Department of Gastrointestinal Surgery, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, Guangdong, PR China.
| |
Collapse
|
169
|
Zhu Z, Nan L, Xie H, Chen H, Wang J, Wei M, Qin J. CSDN: Cross-Modal Shape-Transfer Dual-Refinement Network for Point Cloud Completion. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:3545-3563. [PMID: 37018698 DOI: 10.1109/tvcg.2023.3236061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
How will you repair a physical object with some missings? You may imagine its original shape from previously captured images, recover its overall (global) but coarse shape first, and then refine its local details. We are motivated to imitate the physical repair procedure to address point cloud completion. To this end, we propose a cross-modal shape-transfer dual-refinement network (termed CSDN), a coarse-to-fine paradigm with images of full-cycle participation, for quality point cloud completion. CSDN mainly consists of "shape fusion" and "dual-refinement" modules to tackle the cross-modal challenge. The first module transfers the intrinsic shape characteristics from single images to guide the geometry generation of the missing regions of point clouds, in which we propose IPAdaIN to embed the global features of both the image and the partial point cloud into completion. The second module refines the coarse output by adjusting the positions of the generated points, where the local refinement unit exploits the geometric relation between the novel and the input points by graph convolution, and the global constraint unit utilizes the input image to fine-tune the generated offset. Different from most existing approaches, CSDN not only explores the complementary information from images but also effectively exploits cross-modal data in the whole coarse-to-fine completion procedure. Experimental results indicate that CSDN performs favorably against twelve competitors on the cross-modal benchmark.
Collapse
|
170
|
Liu M, Yang Z, Han W, Xie S. Progressive Neighbor-masked Contrastive Learning for Fusion-style Deep Multi-view Clustering. Neural Netw 2024; 179:106503. [PMID: 38986189 DOI: 10.1016/j.neunet.2024.106503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 05/09/2024] [Accepted: 06/29/2024] [Indexed: 07/12/2024]
Abstract
Fusion-style Deep Multi-view Clustering (FDMC) can efficiently integrate comprehensive feature information from latent embeddings of multiple views and has drawn much attention recently. However, existing FDMC methods suffer from the interference of view-specific information for fusion representation, affecting the learning of discriminative cluster structure. In this paper, we propose a new framework of Progressive Neighbor-masked Contrastive Learning for FDMC (PNCL-FDMC) to tackle the aforementioned issues. Specifically, by using neighbor-masked contrastive learning, PNCL-FDMC can explicitly maintain the local structure during the embedding aggregation, which is beneficial to the common semantics enhancement on the fusion view. Based on the consistent aggregation, the fusion view is further enhanced by diversity-aware cluster structure enhancement. In this process, the enhanced cluster assignments and cluster discrepancies are employed to guide the weighted neighbor-masked contrastive alignment of semantic structure between individual views and the fusion view. Extensive experiments validate the effectiveness of the proposed framework, revealing its ability in discriminative representation learning and improving clustering performance.
Collapse
Affiliation(s)
- Mingyang Liu
- School of Automation, Guangdong Key Laboratory of IoT Information Technology, Guangdong University of Technology, Guangzhou, 510006, China.
| | - Zuyuan Yang
- School of Automation, Guangdong Key Laboratory of IoT Information Technology, Guangdong University of Technology, Guangzhou, 510006, China; Guangdong-Hong Kong-Macao Joint Laboratory for Smart Discrete Manufacturing, Guangzhou 510006, China.
| | - Wei Han
- Guangzhou Railway Polytechnic, Guangzhou, 511300, China.
| | - Shengli Xie
- School of Automation, Guangdong Key Laboratory of IoT Information Technology, Guangdong University of Technology, Guangzhou, 510006, China; Key Laboratory of iDetection and Manufacturing-IoT, Ministry of Education, Guangzhou 510006, China.
| |
Collapse
|
171
|
Sheng X, Li L, Liu D, Li H. VNVC: A Versatile Neural Video Coding Framework for Efficient Human-Machine Vision. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:4579-4596. [PMID: 38252583 DOI: 10.1109/tpami.2024.3356548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Almost all digital videos are coded into compact representations before being transmitted. Such compact representations need to be decoded back to pixels before being displayed to humans and - as usual - before being enhanced/analyzed by machine vision algorithms. Intuitively, it is more efficient to enhance/analyze the coded representations directly without decoding them into pixels. Therefore, we propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis, thereby being versatile for both human and machine vision. Our VNVC framework has a feature-based compression loop. In the loop, one frame is encoded into compact representations and decoded to an intermediate feature that is obtained before performing reconstruction. The intermediate feature can be used as reference in motion compensation and motion estimation through feature-based temporal context mining and cross-domain motion encoder-decoder to compress the following frames. The intermediate feature is directly fed into video reconstruction, video enhancement, and video analysis networks to evaluate its effectiveness. The evaluation shows that our framework with the intermediate feature achieves high compression efficiency for video reconstruction and satisfactory task performances with lower complexities.
Collapse
|
172
|
Jiang Z, Sun S, Peng H, Liu Z, Wang J. Multiple-in-Single-Out Object Detector Leveraging Spiking Neural Membrane Systems and Multiple Transformers. Int J Neural Syst 2024; 34:2450035. [PMID: 38616293 DOI: 10.1142/s0129065724500357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Most existing multi-scale object detectors depend on multi-level feature maps. The Feature Pyramid Networks (FPN) is a significant architecture for object detection that utilizes these multi-level feature maps. However, the use of FPN also increases the detector's complexity. For object detection methods that only use a single-level feature map, the detection performance is limited to some extent because the single-level feature map cannot balance deep semantic information and shallow detail information. We introduce a novel detector - the Spiking Neural P Multiple-in-Single-out (SNPMiSo) detector to address these challenges. The SNPMiSo detector is constructed based on SNP-like neurons. In SNPMiSo, we employ two kinds of Transformers to boost the important features across different-level feature maps separately. After enhancing the features, we use an incremental upsampling module to upsample and merge the two feature maps. This combined feature map is input into the NAF dilated residual module and the NAF dual-branch detection head. This process allows us to extract multi-scale features and carry out detection tasks. Our tests show promising results: On the COCO dataset, SNPMiSo attains an Average Precision (AP) of 38.7, an improvement of 1.0 AP over YOLOF. In addition, SNPMiSo demonstrates a quicker detection speed, outperforming some advanced multi-level and single-level object detectors.
Collapse
Affiliation(s)
- Zhengyuan Jiang
- School of Computer and Software Engineering, Xihua University, Chengdu 610039, P. R. China
| | - Siyan Sun
- School of Computer and Software Engineering, Xihua University, Chengdu 610039, P. R. China
| | - Hong Peng
- School of Computer and Software Engineering, Xihua University, Chengdu 610039, P. R. China
| | - Zhicai Liu
- School of Computer and Software Engineering, Xihua University, Chengdu 610039, P. R. China
| | - Jun Wang
- School of Electrical Engineering and Electronic Information, Xihua University, Chengdu 610039, P. R. China
| |
Collapse
|
173
|
Leventhal S, Gyulassy A, Heimann M, Pascucci V. Exploring Classification of Topological Priors With Machine Learning for Feature Extraction. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:3959-3972. [PMID: 37027638 DOI: 10.1109/tvcg.2023.3248632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
In many scientific endeavors, increasingly abstract representations of data allow for new interpretive methodologies and conceptualization of phenomena. For example, moving from raw imaged pixels to segmented and reconstructed objects allows researchers new insights and means to direct their studies toward relevant areas. Thus, the development of new and improved methods for segmentation remains an active area of research. With advances in machine learning and neural networks, scientists have been focused on employing deep neural networks such as U-Net to obtain pixel-level segmentations, namely, defining associations between pixels and corresponding/referent objects and gathering those objects afterward. Topological analysis, such as the use of the Morse-Smale complex to encode regions of uniform gradient flow behavior, offers an alternative approach: first, create geometric priors, and then apply machine learning to classify. This approach is empirically motivated since phenomena of interest often appear as subsets of topological priors in many applications. Using topological elements not only reduces the learning space but also introduces the ability to use learnable geometries and connectivity to aid the classification of the segmentation target. In this article, we describe an approach to creating learnable topological elements, explore the application of ML techniques to classification tasks in a number of areas, and demonstrate this approach as a viable alternative to pixel-level classification, with similar accuracy, improved execution time, and requiring marginal training data.
Collapse
|
174
|
Wang G, Datta A, Lindquist MA. Improved fMRI-based pain prediction using Bayesian group-wise functional registration. Biostatistics 2024; 25:885-903. [PMID: 37805937 DOI: 10.1093/biostatistics/kxad026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 08/22/2023] [Accepted: 08/27/2023] [Indexed: 10/10/2023] Open
Abstract
In recent years, the field of neuroimaging has undergone a paradigm shift, moving away from the traditional brain mapping approach towards the development of integrated, multivariate brain models that can predict categories of mental events. However, large interindividual differences in both brain anatomy and functional localization after standard anatomical alignment remain a major limitation in performing this type of analysis, as it leads to feature misalignment across subjects in subsequent predictive models. This article addresses this problem by developing and validating a new computational technique for reducing misalignment across individuals in functional brain systems by spatially transforming each subject's functional data to a common latent template map. Our proposed Bayesian functional group-wise registration approach allows us to assess differences in brain function across subjects and individual differences in activation topology. We achieve the probabilistic registration with inverse-consistency by utilizing the generalized Bayes framework with a loss function for the symmetric group-wise registration. It models the latent template with a Gaussian process, which helps capture spatial features in the template, producing a more precise estimation. We evaluate the method in simulation studies and apply it to data from an fMRI study of thermal pain, with the goal of using functional brain activity to predict physical pain. We find that the proposed approach allows for improved prediction of reported pain scores over conventional approaches. Received on 2 January 2017. Editorial decision on 8 June 2021.
Collapse
Affiliation(s)
- Guoqing Wang
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 N Wolfe St, Baltimore, MD 21205, USA
| | - Abhirup Datta
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 N Wolfe St, Baltimore, MD 21205, USA
| | - Martin A Lindquist
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 N Wolfe St, Baltimore, MD 21205, USA
| |
Collapse
|
175
|
Huang C, Shi Y, Zhang B, Lyu K. Uncertainty-aware prototypical learning for anomaly detection in medical images. Neural Netw 2024; 175:106284. [PMID: 38593560 DOI: 10.1016/j.neunet.2024.106284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 03/14/2024] [Accepted: 03/29/2024] [Indexed: 04/11/2024]
Abstract
Anomalous object detection (AOD) in medical images aims to recognize the anomalous lesions, and is crucial for early clinical diagnosis of various cancers. However, it is a difficult task because of two reasons: (1) the diversity of the anomalous lesions and (2) the ambiguity of the boundary between anomalous lesions and their normal surroundings. Unlike existing single-modality AOD models based on deterministic mapping, we constructed a probabilistic and deterministic AOD model. Specifically, we designed an uncertainty-aware prototype learning framework, which considers the diversity and ambiguity of anomalous lesions. A prototypical learning transformer (Pformer) is established to extract and store the prototype features of different anomalous lesions. Moreover, Bayesian neural uncertainty quantizer, a probabilistic model, is designed to model the distributions over the outputs of the model to measure the uncertainty of the model's detection results for each pixel. Essentially, the uncertainty of the model's anomaly detection result for a pixel can reflect the anomalous ambiguity of this pixel. Furthermore, an uncertainty-guided reasoning transformer (Uformer) is devised to employ the anomalous ambiguity, encouraging the proposed model to focus on pixels with high uncertainty. Notably, prototypical representations stored in Pformer are also utilized in anomaly reasoning that enables the model to perceive diversities of the anomalous objects. Extensive experiments on five benchmark datasets demonstrate the superiority of our proposed method. The source code will be available in github.com/umchaohuang/UPformer.
Collapse
Affiliation(s)
- Chao Huang
- PAMI Research Group, Department of Computer and Information Science, University of Macau, Taipa, 519000, Macao Special Administrative Region of China; Shenzhen Campus of Sun Yat-sen University, School of Cyber Science and Technology, Shenzhen, 518107, China
| | - Yushu Shi
- Shenzhen Campus of Sun Yat-sen University, School of Cyber Science and Technology, Shenzhen, 518107, China
| | - Bob Zhang
- PAMI Research Group, Department of Computer and Information Science, University of Macau, Taipa, 519000, Macao Special Administrative Region of China.
| | - Ke Lyu
- School of Engineering Sciences, University of the Chinese Academy of Sciences, Beijing, 100049, China; Pengcheng Laboratory, Shenzhen, 518055, China
| |
Collapse
|
176
|
de Silva A, Zhao M, Stewart D, Khan FH, Dusek G, Davis J, Pang A. RipViz: Finding Rip Currents by Learning Pathline Behavior. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:3930-3944. [PMID: 37022897 DOI: 10.1109/tvcg.2023.3243834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
We present a hybrid machine learning and flow analysis feature detection method, RipViz, to extract rip currents from stationary videos. Rip currents are dangerous strong currents that can drag beachgoers out to sea. Most people are either unaware of them or do not know what they look like. In some instances, even trained personnel such as lifeguards have difficulty identifying them. RipViz produces a simple, easy to understand visualization of rip location overlaid on the source video. With RipViz, we first obtain an unsteady 2D vector field from the stationary video using optical flow. Movement at each pixel is analyzed over time. At each seed point, sequences of short pathlines, rather a single long pathline, are traced across the frames of the video to better capture the quasi-periodic flow behavior of wave activity. Because of the motion on the beach, the surf zone, and the surrounding areas, these pathlines may still appear very cluttered and incomprehensible. Furthermore, lay audiences are not familiar with pathlines and may not know how to interpret them. To address this, we treat rip currents as a flow anomaly in an otherwise normal flow. To learn about the normal flow behavior, we train an LSTM autoencoder with pathline sequences from normal ocean, foreground, and background movements. During test time, we use the trained LSTM autoencoder to detect anomalous pathlines (i.e., those in the rip zone). The origination points of such anomalous pathlines, over the course of the video, are then presented as points within the rip zone. RipViz is fully automated and does not require user input. Feedback from domain expert suggests that RipViz has the potential for wider use.
Collapse
|
177
|
Yousef AM, Deliyski DD, Zacharias SRC, Naghibolhosseini M. Detection of Vocal Fold Image Obstructions in High-Speed Videoendoscopy During Connected Speech in Adductor Spasmodic Dysphonia: A Convolutional Neural Networks Approach. J Voice 2024; 38:951-962. [PMID: 35304042 PMCID: PMC9474736 DOI: 10.1016/j.jvoice.2022.01.028] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 01/30/2022] [Accepted: 01/30/2022] [Indexed: 01/10/2023]
Abstract
OBJECTIVE Adductor spasmodic dysphonia (AdSD) is a neurogenic voice disorder, affecting the intrinsic laryngeal muscle control. AdSD leads to involuntary laryngeal spasms and only reveals during connected speech. Laryngeal high-speed videoendoscopy (HSV) coupled with a flexible fiberoptic endoscope provides a unique opportunity to study voice production and visualize the vocal fold vibrations in AdSD during speech. The goal of this study is to automatically detect instances during which the image of the vocal folds is optically obstructed in HSV recordings obtained during connected speech. METHODS HSV data were recorded from vocally normal adults and patients with AdSD during reading of the "Rainbow Passage", six CAPE-V sentences, and production of the vowel /i/. A convolutional neural network was developed and trained as a classifier to detect obstructed/unobstructed vocal folds in HSV frames. Manually labelled data were used for training, validating, and testing of the network. Moreover, a comprehensive robustness evaluation was conducted to compare the performance of the developed classifier and visual analysis of HSV data. RESULTS The developed convolutional neural network was able to automatically detect the vocal fold obstructions in HSV data in vocally normal participants and AdSD patients. The trained network was tested successfully and showed an overall classification accuracy of 94.18% on the testing dataset. The robustness evaluation showed an average overall accuracy of 94.81% on a massive number of HSV frames demonstrating the high robustness of the introduced technique while keeping a high level of accuracy. CONCLUSIONS The proposed approach can be used for efficient analysis of HSV data to study laryngeal maneuvers in patients with AdSD during connected speech. Additionally, this method will facilitate development of vocal fold vibratory measures for HSV frames with an unobstructed view of the vocal folds. Indicating parts of connected speech that provide an unobstructed view of the vocal folds can be used for developing optimal passages for precise HSV examination during connected speech and subject-specific clinical voice assessment protocols.
Collapse
Affiliation(s)
- Ahmed M Yousef
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | - Dimitar D Deliyski
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | - Stephanie R C Zacharias
- Head and Neck Regenerative Medicine Program, Mayo Clinic, Scottsdale, Arizona; Department of Otolaryngology-Head and Neck Surgery, Mayo Clinic, Phoenix, Arizona
| | - Maryam Naghibolhosseini
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan.
| |
Collapse
|
178
|
Zhang O, Dahlquist N, Leete Z, Xu M, Schneider D, Yang C. Long-term imaging of three-dimensional hyphal development using the ePetri dish. BIOMEDICAL OPTICS EXPRESS 2024; 15:4292-4299. [PMID: 39022548 PMCID: PMC11249690 DOI: 10.1364/boe.530483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 06/07/2024] [Accepted: 06/07/2024] [Indexed: 07/20/2024]
Abstract
Imaging three-dimensional microbial development and behavior over extended periods is crucial for advancing microbiological studies. Here, we introduce an upgraded ePetri dish system specifically designed for extended microbial culturing and 3D imaging, addressing the limitations of existing methods. Our approach includes a sealed growth chamber to enable long-term culturing, and a multi-step reconstruction algorithm that integrates 3D deconvolution, image filtering, ridge, and skeleton detection for detailed visualization of the hyphal network. The system effectively monitored the development of Aspergillus brasiliensis hyphae over a seven-day period, demonstrating the growth medium's stability within the chamber. The system's 3D imaging capability was validated in a volume of 5.5 mm × 4 mm × 0.5 mm, revealing a radial growth pattern of fungal hyphae. Additionally, we show that the system can identify potential filter failures that are undetectable with 2D imaging. With these capabilities, the upgraded ePetri dish represents a significant advancement in long-term 3D microbial imaging, promising new insights into microbial development and behavior across various microbiological research areas.
Collapse
Affiliation(s)
- Oumeng Zhang
- Division of Engineering and Applied Science, California Institute of Technology, 1200 E California Blvd., Pasadena, CA 91125, USA
| | - Nic Dahlquist
- Mango Inc, 1314 Westwood Blvd., Los Angeles, CA 90024, USA
| | - Zachary Leete
- Mango Inc, 1314 Westwood Blvd., Los Angeles, CA 90024, USA
| | - Michael Xu
- Mango Inc, 1314 Westwood Blvd., Los Angeles, CA 90024, USA
| | - Dean Schneider
- Mango Inc, 1314 Westwood Blvd., Los Angeles, CA 90024, USA
| | - Changhuei Yang
- Division of Engineering and Applied Science, California Institute of Technology, 1200 E California Blvd., Pasadena, CA 91125, USA
| |
Collapse
|
179
|
Yoshioka H, Jin R, Hisaka A, Suzuki H. Disease progression modeling with temporal realignment: An emerging approach to deepen knowledge on chronic diseases. Pharmacol Ther 2024; 259:108655. [PMID: 38710372 DOI: 10.1016/j.pharmthera.2024.108655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 04/22/2024] [Accepted: 05/01/2024] [Indexed: 05/08/2024]
Abstract
The recent development of the first disease-modifying drug for Alzheimer's disease represents a major advancement in dementia treatment. Behind this breakthrough is a quarter century of research efforts to understand the disease not by a particular symptom at a given moment, but by long-term sequential changes in multiple biomarkers. Disease progression modeling with temporal realignment (DPM-TR) is an emerging computational approach proposed with this biomarker-based disease concept. By integrating short-term clinical observations of multiple disease biomarkers in a data-driven manner, DPM-TR provides a way to understand the progression of chronic diseases over decades and predict individual disease stages more accurately. DPM-TR has been developed primarily in the area of neurodegenerative diseases but has recently been extended to non-neurodegenerative diseases, including chronic obstructive pulmonary, autoimmune, and ophthalmologic diseases. This review focuses on opportunities for DPM-TR in clinical practice and drug development and discusses its current status and challenges.
Collapse
Affiliation(s)
- Hideki Yoshioka
- Office of Regulatory Science Research, Pharmaceuticals and Medical Devices Agency, Tokyo, Japan; Laboratory of Clinical Pharmacology and Pharmacometrics, Graduate School of Pharmaceutical Sciences, Chiba University, Chiba, Japan
| | - Ryota Jin
- Laboratory of Clinical Pharmacology and Pharmacometrics, Graduate School of Pharmaceutical Sciences, Chiba University, Chiba, Japan
| | - Akihiro Hisaka
- Laboratory of Clinical Pharmacology and Pharmacometrics, Graduate School of Pharmaceutical Sciences, Chiba University, Chiba, Japan.
| | - Hiroshi Suzuki
- Executive Director, Pharmaceuticals and Medical Devices Agency, Tokyo, Japan; Department of Pharmacy, The University of Tokyo Hospital, Faculty of Medicine, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
180
|
Pundhir A, Sagar S, Singh P, Raman B. Echoes of images: multi-loss network for image retrieval in vision transformers. Med Biol Eng Comput 2024; 62:2037-2058. [PMID: 38436836 DOI: 10.1007/s11517-024-03055-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 02/16/2024] [Indexed: 03/05/2024]
Abstract
This paper introduces a novel approach to enhance content-based image retrieval, validated on two benchmark datasets: ISIC-2017 and ISIC-2018. These datasets comprise skin lesion images that are crucial for innovations in skin cancer diagnosis and treatment. We advocate the use of pre-trained Vision Transformer (ViT), a relatively uncharted concept in the realm of image retrieval, particularly in medical scenarios. In contrast to the traditionally employed Convolutional Neural Networks (CNNs), our findings suggest that ViT offers a more comprehensive understanding of the image context, essential in medical imaging. We further incorporate a weighted multi-loss function, delving into various losses such as triplet loss, distillation loss, contrastive loss, and cross-entropy loss. Our exploration investigates the most resilient combination of these losses to create a robust multi-loss function, thus enhancing the robustness of the learned feature space and ameliorating the precision and recall in the retrieval process. Instead of using all the loss functions, the proposed multi-loss function utilizes the combination of only cross-entropy loss, triplet loss, and distillation loss and gains improvement of 6.52% and 3.45% for mean average precision over ISIC-2017 and ISIC-2018. Another innovation in our methodology is a two-branch network strategy, which concurrently boosts image retrieval and classification. Through our experiments, we underscore the effectiveness and the pitfalls of diverse loss configurations in image retrieval. Furthermore, our approach underlines the advantages of retrieval-based classification through majority voting rather than relying solely on the classification head, leading to enhanced prediction for melanoma - the most lethal type of skin cancer. Our results surpass existing state-of-the-art techniques on the ISIC-2017 and ISIC-2018 datasets by improving mean average precision by 1.01% and 4.36% respectively, emphasizing the efficacy and promise of Vision Transformers paired with our tailor-made weighted loss function, especially in medical contexts. The proposed approach's effectiveness is substantiated through thorough ablation studies and an array of quantitative and qualitative outcomes. To promote reproducibility and support forthcoming research, our source code will be accessible on GitHub.
Collapse
Affiliation(s)
- Anshul Pundhir
- Department of Computer Science and Engineering, Indian Institute of Technology, Roorkee, 247667, Uttarakhand, India.
| | - Shivam Sagar
- Department of Electrical Engineering, Indian Institute of Technology, Roorkee, 247667, Uttarakhand, India
| | - Pradeep Singh
- Department of Computer Science and Engineering, Indian Institute of Technology, Roorkee, 247667, Uttarakhand, India
| | - Balasubramanian Raman
- Department of Computer Science and Engineering, Indian Institute of Technology, Roorkee, 247667, Uttarakhand, India
| |
Collapse
|
181
|
Ye Q, Yin H, Lin J, Liang J, Xie M, Ye C, Zhou B, Huang A, Wu Z, Li X, Wu Y. Improved nested U-structure for accurate nailfold capillary segmentation. Microvasc Res 2024; 154:104680. [PMID: 38484792 DOI: 10.1016/j.mvr.2024.104680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 02/28/2024] [Accepted: 03/11/2024] [Indexed: 03/18/2024]
Abstract
Changes in the structure and function of nailfold capillaries may be indicators of numerous diseases. Noninvasive diagnostic tools are commonly used for the extraction of morphological information from segmented nailfold capillaries to study physiological and pathological changes therein. However, current segmentation methods for nailfold capillaries cannot accurately separate capillaries from the background, resulting in issues such as unclear segmentation boundaries. Therefore, improving the accuracy of nailfold capillary segmentation is necessary to facilitate more efficient clinical diagnosis and research. Herein, we propose a nailfold capillary image segmentation method based on a U2-Net backbone network combined with a Transformer structure. This method integrates the U2-Net and Transformer networks to establish a decoder-encoder network, which inserts Transformer layers into the nested two-layer U-shaped architecture of the U2-Net. This structure effectively extracts multiscale features within stages and aggregates multilevel features across stages to generate high-resolution feature maps. The experimental results demonstrate an overall accuracy of 98.23 %, a Dice coefficient of 88.56 %, and an IoU of 80.41 % compared to the ground truth. Furthermore, our proposed method improves the overall accuracy by approximately 2 %, 3 %, and 5 % compared to the original U2-Net, Res-Unet, and U-Net, respectively. These results indicate that the Transformer-U2Net network performs well in nailfold capillary image segmentation and provides more detailed and accurate information on the segmented nailfold capillary structure, which may aid clinicians in the more precise diagnosis and treatment of nailfold capillary-related diseases.
Collapse
Affiliation(s)
- Qianyao Ye
- School of Physics and Optoelectronic Engineering, Foshan University, Foshan 528000, China
| | - Hao Yin
- School of Physics and Optoelectronic Engineering, Foshan University, Foshan 528000, China
| | - Jianan Lin
- School of Physics and Optoelectronic Engineering, Foshan University, Foshan 528000, China
| | - Junzhao Liang
- School of Physics and Optoelectronic Engineering, Foshan University, Foshan 528000, China
| | - Mugui Xie
- School of Physics and Optoelectronic Engineering, Foshan University, Foshan 528000, China
| | - Cong Ye
- School of Physics and Optoelectronic Engineering, Foshan University, Foshan 528000, China
| | - Bin Zhou
- School of Physics and Optoelectronic Engineering, Foshan University, Foshan 528000, China
| | - An Huang
- School of Mechatronic Engineering and Automation, Foshan University, Foshan 528000, China
| | - Zhiwei Wu
- School of Physics and Optoelectronic Engineering, Foshan University, Foshan 528000, China
| | - Xiaosong Li
- School of Physics and Optoelectronic Engineering, Foshan University, Foshan 528000, China
| | - Yanxiong Wu
- School of Physics and Optoelectronic Engineering, Foshan University, Foshan 528000, China; Ji Hua Laboratory, Foshan, Guangdong 528200, China.
| |
Collapse
|
182
|
Wang X, Lv Q, Chen G, Zhang J, Wei Z, Dong J, Fu H, Zhu Z, Liu J, Jin X. MobileSky: Real-Time Sky Replacement for Mobile AR. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:4304-4320. [PMID: 37030763 DOI: 10.1109/tvcg.2023.3257840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
We present MobileSky, the first automatic method for real-time high-quality sky replacement for mobile AR applications. The primary challenge of this task is how to extract sky regions in camera feed both quickly and accurately. While the problem of sky replacement is not new, previous methods mainly concern extraction quality rather than efficiency, limiting their application to our task. We aim to provide higher quality, both spatially and temporally consistent sky mask maps for all camera frames in real time. To this end, we develop a novel framework that combines a new deep semantic network called FSNet with novel post-processing refinement steps. By leveraging IMU data, we also propose new sky-aware constraints such as temporal consistency, position consistency, and color consistency to help refine the weakly classified part of the segmentation output. Experiments show that our method achieves an average of around 30 FPS on off-the-shelf smartphones and outperforms the state-of-the-art sky replacement methods in terms of execution speed and quality. In the meantime, our mask maps appear to be visually more stable across frames. Our fast sky replacement method enables several applications, such as AR advertising, art making, generating fantasy celestial objects, visually learning about weather phenomena, and advanced video-based visual effects. To facilitate future research, we also create a new video dataset containing annotated sky regions with IMU data.
Collapse
|
183
|
Kim S, Park H, Kang M, Jin KH, Adeli E, Pohl KM, Park SH. Federated learning with knowledge distillation for multi-organ segmentation with partially labeled datasets. Med Image Anal 2024; 95:103156. [PMID: 38603844 DOI: 10.1016/j.media.2024.103156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 03/11/2024] [Accepted: 03/20/2024] [Indexed: 04/13/2024]
Abstract
The state-of-the-art multi-organ CT segmentation relies on deep learning models, which only generalize when trained on large samples of carefully curated data. However, it is challenging to train a single model that can segment all organs and types of tumors since most large datasets are partially labeled or are acquired across multiple institutes that may differ in their acquisitions. A possible solution is Federated learning, which is often used to train models on multi-institutional datasets where the data is not shared across sites. However, predictions of federated learning can be unreliable after the model is locally updated at sites due to 'catastrophic forgetting'. Here, we address this issue by using knowledge distillation (KD) so that the local training is regularized with the knowledge of a global model and pre-trained organ-specific segmentation models. We implement the models in a multi-head U-Net architecture that learns a shared embedding space for different organ segmentation, thereby obtaining multi-organ predictions without repeated processes. We evaluate the proposed method using 8 publicly available abdominal CT datasets of 7 different organs. Of those datasets, 889 CTs were used for training, 233 for internal testing, and 30 volumes for external testing. Experimental results verified that our proposed method substantially outperforms other state-of-the-art methods in terms of accuracy, inference time, and the number of parameters.
Collapse
Affiliation(s)
- Soopil Kim
- Department of Robotics and Mechatronics Engineering, Daegu Gyeongbuk Institute of Science and Technology, Republic of Korea; Department of Psychiatry and Behavioral Sciences, Stanford University, CA 94305, USA
| | - Heejung Park
- Department of Robotics and Mechatronics Engineering, Daegu Gyeongbuk Institute of Science and Technology, Republic of Korea
| | - Myeongkyun Kang
- Department of Robotics and Mechatronics Engineering, Daegu Gyeongbuk Institute of Science and Technology, Republic of Korea; Department of Psychiatry and Behavioral Sciences, Stanford University, CA 94305, USA
| | - Kyong Hwan Jin
- School of Electrical Engineering, Korea University, Republic of Korea
| | - Ehsan Adeli
- Department of Psychiatry and Behavioral Sciences, Stanford University, CA 94305, USA
| | - Kilian M Pohl
- Department of Psychiatry and Behavioral Sciences, Stanford University, CA 94305, USA
| | - Sang Hyun Park
- Department of Robotics and Mechatronics Engineering, Daegu Gyeongbuk Institute of Science and Technology, Republic of Korea.
| |
Collapse
|
184
|
Kang S, Oh HS. Probabilistic Principal Curves on Riemannian Manifolds. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:4843-4849. [PMID: 38265902 DOI: 10.1109/tpami.2024.3357801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
This paper studies a new curve-fitting approach to data on Riemannian manifolds. We define a principal curve based on a mixture model for observations and unobserved latent variables and propose a new algorithm to estimate the principal curve for given data points on Riemannian manifolds.
Collapse
|
185
|
Zou J, Song Y, Liu L, Aviles-Rivero AI, Qin J. Unsupervised lung CT image registration via stochastic decomposition of deformation fields. Comput Med Imaging Graph 2024; 115:102397. [PMID: 38735104 DOI: 10.1016/j.compmedimag.2024.102397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Revised: 01/30/2024] [Accepted: 05/01/2024] [Indexed: 05/14/2024]
Abstract
We address the problem of lung CT image registration, which underpins various diagnoses and treatments for lung diseases. The main crux of the problem is the large deformation that the lungs undergo during respiration. This physiological process imposes several challenges from a learning point of view. In this paper, we propose a novel training scheme, called stochastic decomposition, which enables deep networks to effectively learn such a difficult deformation field during lung CT image registration. The key idea is to stochastically decompose the deformation field, and supervise the registration by synthetic data that have the corresponding appearance discrepancy. The stochastic decomposition allows for revealing all possible decompositions of the deformation field. At the learning level, these decompositions can be seen as a prior to reduce the ill-posedness of the registration yielding to boost the performance. We demonstrate the effectiveness of our framework on Lung CT data. We show, through extensive numerical and visual results, that our technique outperforms existing methods.
Collapse
Affiliation(s)
- Jing Zou
- Center for Smart Health, School of Nursing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong, China
| | - Youyi Song
- Department of Data Science, School of Science, China Pharmaceutical University, Nan Jing, 210009, China
| | - Lihao Liu
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, CB30WA, UK
| | - Angelica I Aviles-Rivero
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, CB30WA, UK
| | - Jing Qin
- Center for Smart Health, School of Nursing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong, China.
| |
Collapse
|
186
|
Kolluru C, Joseph N, Seckler J, Fereidouni F, Levenson R, Shoffstall A, Jenkins M, Wilson D. NerveTracker: a Python-based software toolkit for visualizing and tracking groups of nerve fibers in serial block-face microscopy with ultraviolet surface excitation images. JOURNAL OF BIOMEDICAL OPTICS 2024; 29:076501. [PMID: 38912214 PMCID: PMC11188586 DOI: 10.1117/1.jbo.29.7.076501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 05/10/2024] [Accepted: 05/13/2024] [Indexed: 06/25/2024]
Abstract
Significance Information about the spatial organization of fibers within a nerve is crucial to our understanding of nerve anatomy and its response to neuromodulation therapies. A serial block-face microscopy method [three-dimensional microscopy with ultraviolet surface excitation (3D-MUSE)] has been developed to image nerves over extended depths ex vivo. To routinely visualize and track nerve fibers in these datasets, a dedicated and customizable software tool is required. Aim Our objective was to develop custom software that includes image processing and visualization methods to perform microscopic tractography along the length of a peripheral nerve sample. Approach We modified common computer vision algorithms (optic flow and structure tensor) to track groups of peripheral nerve fibers along the length of the nerve. Interactive streamline visualization and manual editing tools are provided. Optionally, deep learning segmentation of fascicles (fiber bundles) can be applied to constrain the tracts from inadvertently crossing into the epineurium. As an example, we performed tractography on vagus and tibial nerve datasets and assessed accuracy by comparing the resulting nerve tracts with segmentations of fascicles as they split and merge with each other in the nerve sample stack. Results We found that a normalized Dice overlap (Dice norm ) metric had a mean value above 0.75 across several millimeters along the nerve. We also found that the tractograms were robust to changes in certain image properties (e.g., downsampling in-plane and out-of-plane), which resulted in only a 2% to 9% change to the meanDice norm values. In a vagus nerve sample, tractography allowed us to readily identify that subsets of fibers from four distinct fascicles merge into a single fascicle as we move ∼ 5 mm along the nerve's length. Conclusions Overall, we demonstrated the feasibility of performing automated microscopic tractography on 3D-MUSE datasets of peripheral nerves. The software should be applicable to other imaging approaches. The code is available at https://github.com/ckolluru/NerveTracker.
Collapse
Affiliation(s)
- Chaitanya Kolluru
- Case Western Reserve University, Department of Biomedical Engineering, Cleveland, Ohio, United States
| | - Naomi Joseph
- Case Western Reserve University, Department of Biomedical Engineering, Cleveland, Ohio, United States
| | - James Seckler
- Case Western Reserve University, Department of Biomedical Engineering, Cleveland, Ohio, United States
| | - Farzad Fereidouni
- UC Davis Medical Center, Department of Pathology and Laboratory Medicine, Sacramento, California, United States
| | - Richard Levenson
- UC Davis Medical Center, Department of Pathology and Laboratory Medicine, Sacramento, California, United States
| | - Andrew Shoffstall
- Case Western Reserve University, Department of Biomedical Engineering, Cleveland, Ohio, United States
- Louis Stokes Cleveland VA Medical Center, Cleveland, Ohio, United States
| | - Michael Jenkins
- Case Western Reserve University, Department of Biomedical Engineering, Cleveland, Ohio, United States
- Louis Stokes Cleveland VA Medical Center, Cleveland, Ohio, United States
- Case Western Reserve University, Department of Pediatrics, Cleveland, Ohio, United States
| | - David Wilson
- Case Western Reserve University, Department of Biomedical Engineering, Cleveland, Ohio, United States
- Case Western Reserve University, Department of Radiology, Cleveland, Ohio, United States
| |
Collapse
|
187
|
António J, Valente J, Mora C, Almeida A, Jardim S. DarwinGSE: Towards better image retrieval systems for intellectual property datasets. PLoS One 2024; 19:e0304915. [PMID: 38950045 PMCID: PMC11216576 DOI: 10.1371/journal.pone.0304915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Accepted: 05/20/2024] [Indexed: 07/03/2024] Open
Abstract
A trademark's image is usually the first type of indirect contact between a consumer and a product or a service. Companies rely on graphical trademarks as a symbol of quality and instant recognition, seeking to protect them from copyright infringements. A popular defense mechanism is graphical searching, where an image is compared to a large database to find potential conflicts with similar trademarks. Despite not being a new subject, image retrieval state-of-the-art lacks reliable solutions in the Industrial Property (IP) sector, where datasets are practically unrestricted in content, with abstract images for which modeling human perception is a challenging task. Existing Content-based Image Retrieval (CBIR) systems still present several problems, particularly in terms of efficiency and reliability. In this paper, we propose a new CBIR system that overcomes these major limitations. It follows a modular methodology, composed of a set of individual components tasked with the retrieval, maintenance and gradual optimization of trademark image searching, working on large-scale, unlabeled datasets. Its generalization capacity is achieved using multiple feature descriptions, weighted separately, and combined to represent a single similarity score. Images are evaluated for general features, edge maps, and regions of interest, using a method based on Watershedding K-Means segments. We propose an image recovery process that relies on a new similarity measure between all feature descriptions. New trademark images are added every day to ensure up-to-date results. The proposed system showcases a timely retrieval speed, with 95% of searches having a 10 second presentation speed and a mean average precision of 93.7%, supporting its applicability to real-word IP protection scenarios.
Collapse
Affiliation(s)
- João António
- Techframe-Information Systems, SA, São Domingos de Rana, Portugal
| | - Jorge Valente
- Techframe-Information Systems, SA, São Domingos de Rana, Portugal
| | - Carlos Mora
- Smart Cities Research Center, Polytechnic Institute of Tomar, Tomar, Portugal
| | - Artur Almeida
- Techframe-Information Systems, SA, São Domingos de Rana, Portugal
| | - Sandra Jardim
- Smart Cities Research Center, Polytechnic Institute of Tomar, Tomar, Portugal
| |
Collapse
|
188
|
Day TG, Matthew J, Budd SF, Venturini L, Wright R, Farruggia A, Vigneswaran TV, Zidere V, Hajnal JV, Razavi R, Simpson JM, Kainz B. Interaction between clinicians and artificial intelligence to detect fetal atrioventricular septal defects on ultrasound: how can we optimize collaborative performance? ULTRASOUND IN OBSTETRICS & GYNECOLOGY : THE OFFICIAL JOURNAL OF THE INTERNATIONAL SOCIETY OF ULTRASOUND IN OBSTETRICS AND GYNECOLOGY 2024; 64:28-35. [PMID: 38197584 DOI: 10.1002/uog.27577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 12/19/2023] [Accepted: 12/30/2023] [Indexed: 01/11/2024]
Abstract
OBJECTIVES Artificial intelligence (AI) has shown promise in improving the performance of fetal ultrasound screening in detecting congenital heart disease (CHD). The effect of giving AI advice to human operators has not been studied in this context. Giving additional information about AI model workings, such as confidence scores for AI predictions, may be a way of further improving performance. Our aims were to investigate whether AI advice improved overall diagnostic accuracy (using a single CHD lesion as an exemplar), and to determine what, if any, additional information given to clinicians optimized the overall performance of the clinician-AI team. METHODS An AI model was trained to classify a single fetal CHD lesion (atrioventricular septal defect (AVSD)), using a retrospective cohort of 121 130 cardiac four-chamber images extracted from 173 ultrasound scan videos (98 with normal hearts, 75 with AVSD); a ResNet50 model architecture was used. Temperature scaling of model prediction probability was performed on a validation set, and gradient-weighted class activation maps (grad-CAMs) produced. Ten clinicians (two consultant fetal cardiologists, three trainees in pediatric cardiology and five fetal cardiac sonographers) were recruited from a center of fetal cardiology to participate. Each participant was shown 2000 fetal four-chamber images in a random order (1000 normal and 1000 AVSD). The dataset comprised 500 images, each shown in four conditions: (1) image alone without AI output; (2) image with binary AI classification; (3) image with AI model confidence; and (4) image with grad-CAM image overlays. The clinicians were asked to classify each image as normal or AVSD. RESULTS A total of 20 000 image classifications were recorded from 10 clinicians. The AI model alone achieved an accuracy of 0.798 (95% CI, 0.760-0.832), a sensitivity of 0.868 (95% CI, 0.834-0.902) and a specificity of 0.728 (95% CI, 0.702-0.754), and the clinicians without AI achieved an accuracy of 0.844 (95% CI, 0.834-0.854), a sensitivity of 0.827 (95% CI, 0.795-0.858) and a specificity of 0.861 (95% CI, 0.828-0.895). Showing a binary (normal or AVSD) AI model output resulted in significant improvement in accuracy to 0.865 (P < 0.001). This effect was seen in both experienced and less-experienced participants. Giving incorrect AI advice resulted in a significant deterioration in overall accuracy, from 0.761 to 0.693 (P < 0.001), which was driven by an increase in both Type-I and Type-II errors by the clinicians. This effect was worsened by showing model confidence (accuracy, 0.649; P < 0.001) or grad-CAM (accuracy, 0.644; P < 0.001). CONCLUSIONS AI has the potential to improve performance when used in collaboration with clinicians, even if the model performance does not reach expert level. Giving additional information about model workings such as model confidence and class activation map image overlays did not improve overall performance, and actually worsened performance for images for which the AI model was incorrect. © 2024 The Authors. Ultrasound in Obstetrics & Gynecology published by John Wiley & Sons Ltd on behalf of International Society of Ultrasound in Obstetrics and Gynecology.
Collapse
Affiliation(s)
- T G Day
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
- Department of Congenital Heart Disease, Evelina London Children's Healthcare, Guy's and St Thomas' NHS Foundation Trust, London, UK
| | - J Matthew
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
| | - S F Budd
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
| | - L Venturini
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
| | - R Wright
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
| | - A Farruggia
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
| | - T V Vigneswaran
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
- Department of Congenital Heart Disease, Evelina London Children's Healthcare, Guy's and St Thomas' NHS Foundation Trust, London, UK
| | - V Zidere
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
- Department of Congenital Heart Disease, Evelina London Children's Healthcare, Guy's and St Thomas' NHS Foundation Trust, London, UK
- Harris Birthright Research Centre, King's College London NHS Foundation Trust, London, UK
| | - J V Hajnal
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
| | - R Razavi
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
- Department of Congenital Heart Disease, Evelina London Children's Healthcare, Guy's and St Thomas' NHS Foundation Trust, London, UK
| | - J M Simpson
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
- Department of Congenital Heart Disease, Evelina London Children's Healthcare, Guy's and St Thomas' NHS Foundation Trust, London, UK
| | - B Kainz
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität, Erlangen-Nürnberg, Germany
- Department of Computing, Faculty of Engineering, Imperial College London, London, UK
| |
Collapse
|
189
|
Sung C, Oh JS, Park BS, Kim SS, Song SY, Lee JJ. Diagnostic performance of a deep-learning model using 18F-FDG PET/CT for evaluating recurrence after radiation therapy in patients with lung cancer. Ann Nucl Med 2024; 38:516-524. [PMID: 38589677 DOI: 10.1007/s12149-024-01925-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 03/21/2024] [Indexed: 04/10/2024]
Abstract
OBJECTIVE We developed a deep learning model for distinguishing radiation therapy (RT)-related changes and tumour recurrence in patients with lung cancer who underwent RT, and evaluated its performance. METHODS We retrospectively recruited 308 patients with lung cancer with RT-related changes observed on 18F-fluorodeoxyglucose positron emission tomography-computed tomography (18F-FDG PET/CT) performed after RT. Patients were labelled as positive or negative for tumour recurrence through histologic diagnosis or clinical follow-up after 18F-FDG PET/CT. A two-dimensional (2D) slice-based convolutional neural network (CNN) model was created with a total of 3329 slices as input, and performance was evaluated with five independent test sets. RESULTS For the five independent test sets, the area under the curve (AUC) of the receiver operating characteristic curve, sensitivity, and specificity were in the range of 0.98-0.99, 95-98%, and 87-95%, respectively. The region determined by the model was confirmed as an actual recurred tumour through the explainable artificial intelligence (AI) using gradient-weighted class activation mapping (Grad-CAM). CONCLUSION The 2D slice-based CNN model using 18F-FDG PET imaging was able to distinguish well between RT-related changes and tumour recurrence in patients with lung cancer.
Collapse
Affiliation(s)
- Changhwan Sung
- Department of Nuclear Medicine, Asan Medical Center, University of Ulsan College of Medicine, 88 Olympic-Ro 43-Gil, Songpa-Gu, Seoul, 05505, Korea
| | - Jungsu S Oh
- Department of Nuclear Medicine, Asan Medical Center, University of Ulsan College of Medicine, 88 Olympic-Ro 43-Gil, Songpa-Gu, Seoul, 05505, Korea
| | - Byung Soo Park
- Department of Nuclear Medicine, Asan Medical Center, University of Ulsan College of Medicine, 88 Olympic-Ro 43-Gil, Songpa-Gu, Seoul, 05505, Korea
| | - Su Ssan Kim
- Department of Radiation Oncology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea
| | - Si Yeol Song
- Department of Radiation Oncology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea
| | - Jong Jin Lee
- Department of Nuclear Medicine, Asan Medical Center, University of Ulsan College of Medicine, 88 Olympic-Ro 43-Gil, Songpa-Gu, Seoul, 05505, Korea.
| |
Collapse
|
190
|
Tian X, Zhang Z, Wang C, Zhang W, Qu Y, Ma L, Wu Z, Xie Y, Tao D. Variational Distillation for Multi-View Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:4551-4566. [PMID: 38133979 DOI: 10.1109/tpami.2023.3343717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2023]
Abstract
Information Bottleneck (IB) provides an information-theoretic principle for multi-view learning by revealing the various components contained in each viewpoint. This highlights the necessity to capture their distinct roles to achieve view-invariance and predictive representations but remains under-explored due to the technical intractability of modeling and organizing innumerable mutual information (MI) terms. Recent studies show that sufficiency and consistency play such key roles in multi-view representation learning, and could be preserved via a variational distillation framework. But when it generalizes to arbitrary viewpoints, such strategy fails as the mutual information terms of consistency become complicated. This paper presents Multi-View Variational Distillation (MV 2D), tackling the above limitations for generalized multi-view learning. Uniquely, MV 2D can recognize useful consistent information and prioritize diverse components by their generalization ability. This guides an analytical and scalable solution to achieving both sufficiency and consistency. Additionally, by rigorously reformulating the IB objective, MV 2D tackles the difficulties in MI optimization and fully realizes the theoretical advantages of the information bottleneck principle. We extensively evaluate our model on diverse tasks to verify its effectiveness, where the considerable gains provide key insights into achieving generalized multi-view representations under a rigorous information-theoretic principle.
Collapse
|
191
|
Rekik W, Le Hégarat-Mascle S, Ezzedini S, de Marco G. Detection of atypical attentional behaviors in young subjects. J Neurosci Methods 2024; 407:110141. [PMID: 38641265 DOI: 10.1016/j.jneumeth.2024.110141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 04/09/2024] [Accepted: 04/16/2024] [Indexed: 04/21/2024]
Abstract
BACKGROUND Vigilance ability refers to the accuracy and speed with which a person performs a cognitive-motor task, either voluntarily (endogenous mode) or following a warning stimulus (exogenous mode). In the context of a force production task, our study focuses on the impact of the states of vigilance by proposing an original approach that allows distinguishing between good (inlier) and poor (outlier) participants. We assume that the use of an external signal and duration of the temporal preparation (foreperiod) increase the speed and the precision of motor responses. Our objective is particularly challenging in the context of a limited dataset with a high level of noise. NEW METHOD Our original methodological approach consists of coupling the RANSAC (RANdom SAmple Consensus) algorithm with a statistical machine learning algorithm to handle noise. COMPARISON WITH EXISTING METHODS Our clustering approach, based on the coupling of RANSAC methodology with ensemble classifiers, overcomes the limitations of conventional supervised algorithms that are either not robust to outliers (such as K-Nearest Neighbors) and/or not adapted to few-shot learning (such as Support Vector Machines and Artificial Neural Networks). RESULTS The clustering results were validated in terms of reaction time distributions and force error distributions with respect to participant groups. We show that the use of an external signal and duration of the temporal preparation (foreperiod) increase the speed and the precision of motor responses. CONCLUSION Our study has allowed us to detect atypical attentional patterns and succeeds in separating the inliers from the outliers.
Collapse
Affiliation(s)
- Wafa Rekik
- Research Laboratory COSIM, Higher School of Communications of Tunis, University of Carthage, Route de Raoued 3.5 Km, Cité El Ghazala, Ariana 2088, Tunisia.
| | | | | | | |
Collapse
|
192
|
Cho SM, Joo HH, Golla P, Sahu M, Shankar A, Trakimas DR, Creighton F, Akst L, Taylor RH, Galaiya D. Tremor Assessment in Robot-Assisted Microlaryngeal Surgery Using Computer Vision-Based Tool Tracking. Otolaryngol Head Neck Surg 2024; 171:188-196. [PMID: 38488231 PMCID: PMC11211051 DOI: 10.1002/ohn.714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 01/30/2024] [Accepted: 02/09/2024] [Indexed: 04/16/2024]
Abstract
OBJECTIVE Use microscopic video-based tracking of laryngeal surgical instruments to investigate the effect of robot assistance on instrument tremor. STUDY DESIGN Experimental trial. SETTING Tertiary Academic Medical Center. METHODS In this randomized cross-over trial, 36 videos were recorded from 6 surgeons performing left and right cordectomies on cadaveric pig larynges. These recordings captured 3 distinct conditions: without robotic assistance, with robot-assisted scissors, and with robot-assisted graspers. To assess tool tremor, we employed computer vision-based algorithms for tracking surgical tools. Absolute tremor bandpower and normalized path length were utilized as quantitative measures. Wilcoxon rank sum exact tests were employed for statistical analyses and comparisons between trials. Additionally, surveys were administered to assess the perceived ease of use of the robotic system. RESULTS Absolute tremor bandpower showed a significant decrease when using robot-assisted instruments compared to freehand instruments (P = .012). Normalized path length significantly decreased with robot-assisted compared to freehand trials (P = .001). For the scissors, robot-assisted trials resulted in a significant decrease in absolute tremor bandpower (P = .002) and normalized path length (P < .001). For the graspers, there was no significant difference in absolute tremor bandpower (P = .4), but there was a significantly lower normalized path length in the robot-assisted trials (P = .03). CONCLUSION This study demonstrated that computer-vision-based approaches can be used to assess tool motion in simulated microlaryngeal procedures. The results suggest that robot assistance is capable of reducing instrument tremor.
Collapse
Affiliation(s)
- Sue M. Cho
- Department of Computer Science, Johns Hopkins, Baltimore, Maryland, USA
| | - Henry H. Joo
- Department of Otolaryngology–Head & Neck Surgery, Johns Hopkins, Baltimore, Maryland, USA
| | - Pranathi Golla
- Department of Mechanical Engineering, Johns Hopkins, Baltimore, Maryland, USA
| | - Manish Sahu
- Department of Computer Science, Johns Hopkins, Baltimore, Maryland, USA
| | - Ahjeetha Shankar
- Department of Otolaryngology–Head & Neck Surgery, Johns Hopkins, Baltimore, Maryland, USA
| | - Danielle R. Trakimas
- Department of Otolaryngology–Head & Neck Surgery, Johns Hopkins, Baltimore, Maryland, USA
| | - Francis Creighton
- Department of Otolaryngology–Head & Neck Surgery, Johns Hopkins, Baltimore, Maryland, USA
| | - Lee Akst
- Department of Otolaryngology–Head & Neck Surgery, Johns Hopkins, Baltimore, Maryland, USA
| | - Russell H. Taylor
- Department of Computer Science, Johns Hopkins, Baltimore, Maryland, USA
| | - Deepa Galaiya
- Department of Otolaryngology–Head & Neck Surgery, Johns Hopkins, Baltimore, Maryland, USA
| |
Collapse
|
193
|
Teufel T, Shu H, Soberanis-Mukul RD, Mangulabnan JE, Sahu M, Vedula SS, Ishii M, Hager G, Taylor RH, Unberath M. OneSLAM to map them all: a generalized approach to SLAM for monocular endoscopic imaging based on tracking any point. Int J Comput Assist Radiol Surg 2024; 19:1259-1266. [PMID: 38775904 DOI: 10.1007/s11548-024-03171-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 04/30/2024] [Indexed: 07/10/2024]
Abstract
PURPOSE Monocular SLAM algorithms are the key enabling technology for image-based surgical navigation systems for endoscopic procedures. Due to the visual feature scarcity and unique lighting conditions encountered in endoscopy, classical SLAM approaches perform inconsistently. Many of the recent approaches to endoscopic SLAM rely on deep learning models. They show promising results when optimized on singular domains such as arthroscopy, sinus endoscopy, colonoscopy or laparoscopy, but are limited by an inability to generalize to different domains without retraining. METHODS To address this generality issue, we propose OneSLAM a monocular SLAM algorithm for surgical endoscopy that works out of the box for several endoscopic domains, including sinus endoscopy, colonoscopy, arthroscopy and laparoscopy. Our pipeline builds upon robust tracking any point (TAP) foundation models to reliably track sparse correspondences across multiple frames and runs local bundle adjustment to jointly optimize camera poses and a sparse 3D reconstruction of the anatomy. RESULTS We compare the performance of our method against three strong baselines previously proposed for monocular SLAM in endoscopy and general scenes. OneSLAM presents better or comparable performance over existing approaches targeted to that specific data in all four tested domains, generalizing across domains without the need for retraining. CONCLUSION OneSLAM benefits from the convincing performance of TAP foundation models but generalizes to endoscopic sequences of different anatomies all while demonstrating better or comparable performance over domain-specific SLAM approaches. Future research on global loop closure will investigate how to reliably detect loops in endoscopic scenes to reduce accumulated drift and enhance long-term navigation capabilities.
Collapse
Affiliation(s)
- Timo Teufel
- Johns Hopkins University, Baltimore, MD, 21211, USA.
| | - Hongchao Shu
- Johns Hopkins University, Baltimore, MD, 21211, USA
| | | | | | - Manish Sahu
- Johns Hopkins University, Baltimore, MD, 21211, USA
| | | | - Masaru Ishii
- Johns Hopkins Medical Institutions, Baltimore, MD, 21287, USA
| | | | - Russell H Taylor
- Johns Hopkins University, Baltimore, MD, 21211, USA
- Johns Hopkins Medical Institutions, Baltimore, MD, 21287, USA
| | - Mathias Unberath
- Johns Hopkins University, Baltimore, MD, 21211, USA
- Johns Hopkins Medical Institutions, Baltimore, MD, 21287, USA
| |
Collapse
|
194
|
Tian H, Zhang B, Zhang Z, Xu Z, Jin L, Bian Y, Wu J. DenseNet model incorporating hybrid attention mechanisms and clinical features for pancreatic cystic tumor classification. J Appl Clin Med Phys 2024; 25:e14380. [PMID: 38715381 PMCID: PMC11244679 DOI: 10.1002/acm2.14380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 02/18/2024] [Accepted: 04/15/2024] [Indexed: 07/14/2024] Open
Abstract
PURPOSE The aim of this study is to develop a deep learning model capable of discriminating between pancreatic plasma cystic neoplasms (SCN) and mucinous cystic neoplasms (MCN) by leveraging patient-specific clinical features and imaging outcomes. The intent is to offer valuable diagnostic support to clinicians in their clinical decision-making processes. METHODS The construction of the deep learning model involved utilizing a dataset comprising abdominal magnetic resonance T2-weighted images obtained from patients diagnosed with pancreatic cystic tumors at Changhai Hospital. The dataset comprised 207 patients with SCN and 93 patients with MCN, encompassing a total of 1761 images. The foundational architecture employed was DenseNet-161, augmented with a hybrid attention mechanism module. This integration aimed to enhance the network's attentiveness toward channel and spatial features, thereby amplifying its performance. Additionally, clinical features were incorporated prior to the fully connected layer of the network to actively contribute to subsequent decision-making processes, thereby significantly augmenting the model's classification accuracy. The final patient classification outcomes were derived using a joint voting methodology, and the model underwent comprehensive evaluation. RESULTS Using the five-fold cross validation, the accuracy of the classification model in this paper was 92.44%, with an AUC value of 0.971, a precision rate of 0.956, a recall rate of 0.919, a specificity of 0.933, and an F1-score of 0.936. CONCLUSION This study demonstrates that the DenseNet model, which incorporates hybrid attention mechanisms and clinical features, is effective for distinguishing between SCN and MCN, and has potential application for the diagnosis of pancreatic cystic tumors in clinical practice.
Collapse
Affiliation(s)
- Hui Tian
- School of Health Science and EngineeringUniversity of Shanghai for Science and TechnologyShanghaiChina
| | - Bo Zhang
- School of Medical TechnologyBinzhou PolytechnicShandongChina
| | - Zhiwei Zhang
- School of Health Science and EngineeringUniversity of Shanghai for Science and TechnologyShanghaiChina
| | - Zhenshun Xu
- School of Health Science and EngineeringUniversity of Shanghai for Science and TechnologyShanghaiChina
| | - Liang Jin
- Department of RadiologyHuadong HospitalFudan UniversityShanghaiChina
| | - Yun Bian
- Department of RadiologyChanghai HospitalThe Navy Military Medical UniversityShanghaiChina
| | - Jie Wu
- School of Health Science and EngineeringUniversity of Shanghai for Science and TechnologyShanghaiChina
| |
Collapse
|
195
|
Vun DSY, Bowers R, McGarry A. Vision-based motion capture for the gait analysis of neurodegenerative diseases: A review. Gait Posture 2024; 112:95-107. [PMID: 38754258 DOI: 10.1016/j.gaitpost.2024.04.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 04/25/2024] [Accepted: 04/26/2024] [Indexed: 05/18/2024]
Abstract
BACKGROUND Developments in vision-based systems and human pose estimation algorithms have the potential to detect, monitor and intervene early on neurodegenerative diseases through gait analysis. However, the gap between the technology available and actual clinical practice is evident as most clinicians still rely on subjective observational gait analysis or objective marker-based analysis that is time-consuming. RESEARCH QUESTION This paper aims to examine the main developments of vision-based motion capture and how such advances may be integrated into clinical practice. METHODS The literature review was conducted in six online databases using Boolean search terms. A commercial system search was also included. A predetermined methodological criterion was then used to assess the quality of the selected articles. RESULTS A total of seventeen studies were evaluated, with thirteen studies focusing on gait classification systems and four studies on gait measurement systems. Of the gait classification systems, nine studies utilized artificial intelligence-assisted techniques, while four studies employed statistical techniques. The results revealed high correlations of gait features identified by classifier models with existing clinical rating scales. These systems demonstrated generally high classification accuracies and were effective in diagnosing disease severity levels. Gait measurement systems that extract spatiotemporal and kinematic joint information from video data generally found accurate measurements of gait parameters with low mean absolute errors, high intra- and inter-rater reliability. SIGNIFICANCE Low cost, portable vision-based systems can provide proof of concept for the quantification of gait, expansion of gait assessment tools, remote gait analysis of neurodegenerative diseases and a point of care system for orthotic evaluation. However, certain challenges, including small sample sizes, occlusion risks, and selection bias in training models, need to be addressed. Nevertheless, these systems can serve as complementary tools, equipping clinicians with essential gait information to objectively assess disease severity and tailor personalized treatment for enhanced patient care.
Collapse
Affiliation(s)
- David Sing Yee Vun
- National Centre for Prosthetics and Orthotics, Department of Biomedical Engineering, University of Strathclyde, Glasgow, UK
| | - Robert Bowers
- National Centre for Prosthetics and Orthotics, Department of Biomedical Engineering, University of Strathclyde, Glasgow, UK
| | - Anthony McGarry
- National Centre for Prosthetics and Orthotics, Department of Biomedical Engineering, University of Strathclyde, Glasgow, UK.
| |
Collapse
|
196
|
Tzitzimpasis P, Ries M, Raaymakers BW, Zachiu C. Generalized div-curl based regularization for physically constrained deformable image registration. Sci Rep 2024; 14:15002. [PMID: 38951683 PMCID: PMC11217375 DOI: 10.1038/s41598-024-65896-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Accepted: 06/25/2024] [Indexed: 07/03/2024] Open
Abstract
Variational image registration methods commonly employ a similarity metric and a regularization term that renders the minimization problem well-posed. However, many frequently used regularizations such as smoothness or curvature do not necessarily reflect the underlying physics that apply to anatomical deformations. This, in turn, can make the accurate estimation of complex deformations particularly challenging. Here, we present a new highly flexible regularization inspired from the physics of fluid dynamics which allows applying independent penalties on the divergence and curl of the deformations and/or their nth order derivative. The complexity of the proposed generalized div-curl regularization renders the problem particularly challenging using conventional optimization techniques. To this end, we develop a transformation model and an optimization scheme that uses the divergence and curl components of the deformation as control parameters for the registration. We demonstrate that the original unconstrained minimization problem reduces to a constrained problem for which we propose the use of the augmented Lagrangian method. Doing this, the equations of motion greatly simplify and become managable. Our experiments indicate that the proposed framework can be applied on a variety of different registration problems and produce highly accurate deformations with the desired physical properties.
Collapse
Affiliation(s)
- Paris Tzitzimpasis
- Department of Radiotherapy, UMC Utrecht, 3584 CX, Utrecht, The Netherlands.
| | - Mario Ries
- Imaging Division, UMC Utrecht, 3584 CX, Utrecht, The Netherlands
| | - Bas W Raaymakers
- Department of Radiotherapy, UMC Utrecht, 3584 CX, Utrecht, The Netherlands
| | - Cornel Zachiu
- Department of Radiotherapy, UMC Utrecht, 3584 CX, Utrecht, The Netherlands
| |
Collapse
|
197
|
Thandiackal K, Piccinelli L, Gupta R, Pati P, Goksel O. Multi-Scale Feature Alignment for Continual Learning of Unlabeled Domains. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:2599-2609. [PMID: 38381642 DOI: 10.1109/tmi.2024.3368365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Methods for unsupervised domain adaptation (UDA) help to improve the performance of deep neural networks on unseen domains without any labeled data. Especially in medical disciplines such as histopathology, this is crucial since large datasets with detailed annotations are scarce. While the majority of existing UDA methods focus on the adaptation from a labeled source to a single unlabeled target domain, many real-world applications with a long life cycle involve more than one target domain. Thus, the ability to sequentially adapt to multiple target domains becomes essential. In settings where the data from previously seen domains cannot be stored, e.g., due to data protection regulations, the above becomes a challenging continual learning problem. To this end, we propose to use generative feature-driven image replay in conjunction with a dual-purpose discriminator that not only enables the generation of images with realistic features for replay, but also promotes feature alignment during domain adaptation. We evaluate our approach extensively on a sequence of three histopathological datasets for tissue-type classification, achieving state-of-the-art results. We present detailed ablation experiments studying our proposed method components and demonstrate a possible use-case of our continual UDA method for an unsupervised patch-based segmentation task given high-resolution tissue images. Our code is available at: https://github.com/histocartography/multi-scale-feature-alignment.
Collapse
|
198
|
Huang Z, Zhao R, Leung FHF, Banerjee S, Lam KM, Zheng YP, Ling SH. Landmark Localization From Medical Images With Generative Distribution Prior. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:2679-2692. [PMID: 38421850 DOI: 10.1109/tmi.2024.3371948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/02/2024]
Abstract
In medical image analysis, anatomical landmarks usually contain strong prior knowledge of their structural information. In this paper, we propose to promote medical landmark localization by modeling the underlying landmark distribution via normalizing flows. Specifically, we introduce the flow-based landmark distribution prior as a learnable objective function into a regression-based landmark localization framework. Moreover, we employ an integral operation to make the mapping from heatmaps to coordinates differentiable to further enhance heatmap-based localization with the learned distribution prior. Our proposed Normalizing Flow-based Distribution Prior (NFDP) employs a straightforward backbone and non-problem-tailored architecture (i.e., ResNet18), which delivers high-fidelity outputs across three X-ray-based landmark localization datasets. Remarkably, the proposed NFDP can do the job with minimal additional computational burden as the normalizing flows module is detached from the framework on inferencing. As compared to existing techniques, our proposed NFDP provides a superior balance between prediction accuracy and inference speed, making it a highly efficient and effective approach. The source code of this paper is available at https://github.com/jacksonhzx95/NFDP.
Collapse
|
199
|
Zhu K, Shen Z, Wang M, Jiang L, Zhang Y, Yang T, Zhang H, Zhang M. Visual Knowledge Domain of Artificial Intelligence in Computed Tomography: A Review Based on Bibliometric Analysis. J Comput Assist Tomogr 2024; 48:652-662. [PMID: 38271538 DOI: 10.1097/rct.0000000000001585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2024]
Abstract
ABSTRACT Artificial intelligence (AI)-assisted medical imaging technology is a new research area of great interest that has developed rapidly over the last decade. However, there has been no bibliometric analysis of published studies in this field. The present review focuses on AI-related studies on computed tomography imaging in the Web of Science database and uses CiteSpace and VOSviewer to generate a knowledge map and conduct the basic information analysis, co-word analysis, and co-citation analysis. A total of 7265 documents were included and the number of documents published had an overall upward trend. Scholars from the United States and China have made outstanding achievements, and there is a general lack of extensive cooperation in this field. In recent years, the research areas of great interest and difficulty have been the optimization and upgrading of algorithms, and the application of theoretical models to practical clinical applications. This review will help researchers understand the developments, research areas of great interest, and research frontiers in this field and provide reference and guidance for future studies.
Collapse
|
200
|
Rossi L, Fiorentino MC, Mancini A, Paolanti M, Rosati R, Zingaretti P. Generalizability and robustness evaluation of attribute-based zero-shot learning. Neural Netw 2024; 175:106278. [PMID: 38581809 DOI: 10.1016/j.neunet.2024.106278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 02/15/2024] [Accepted: 03/26/2024] [Indexed: 04/08/2024]
Abstract
In the field of deep learning, large quantities of data are typically required to effectively train models. This challenge has given rise to techniques like zero-shot learning (ZSL), which trains models on a set of "seen" classes and evaluates them on a set of "unseen" classes. Although ZSL has shown considerable potential, particularly with the employment of generative methods, its generalizability to real-world scenarios remains uncertain. The hypothesis of this work is that the performance of ZSL models is systematically influenced by the chosen "splits"; in particular, the statistical properties of the classes and attributes used in training. In this paper, we test this hypothesis by introducing the concepts of generalizability and robustness in attribute-based ZSL and carry out a variety of experiments to stress-test ZSL models against different splits. Our aim is to lay the groundwork for future research on ZSL models' generalizability, robustness, and practical applications. We evaluate the accuracy of state-of-the-art models on benchmark datasets and identify consistent trends in generalizability and robustness. We analyze how these properties vary based on the dataset type, differentiating between coarse- and fine-grained datasets, and our findings indicate significant room for improvement in both generalizability and robustness. Furthermore, our results demonstrate the effectiveness of dimensionality reduction techniques in improving the performance of state-of-the-art models in fine-grained datasets.
Collapse
Affiliation(s)
- Luca Rossi
- Dipartimento di Ingegneria dell'Informazione, Università Politecnica delle Marche, Via Brecce Bianche 12, 60131, Ancona, Italy.
| | - Maria Chiara Fiorentino
- Dipartimento di Ingegneria dell'Informazione, Università Politecnica delle Marche, Via Brecce Bianche 12, 60131, Ancona, Italy.
| | - Adriano Mancini
- Dipartimento di Ingegneria dell'Informazione, Università Politecnica delle Marche, Via Brecce Bianche 12, 60131, Ancona, Italy.
| | - Marina Paolanti
- Dipartimento di Scienze politiche, della Comunicazione e delle Relazioni Internazionali, Università di Macerata, 62100, Macerata, Italy.
| | - Riccardo Rosati
- Dipartimento di Ingegneria dell'Informazione, Università Politecnica delle Marche, Via Brecce Bianche 12, 60131, Ancona, Italy.
| | - Primo Zingaretti
- Dipartimento di Ingegneria dell'Informazione, Università Politecnica delle Marche, Via Brecce Bianche 12, 60131, Ancona, Italy.
| |
Collapse
|