1
|
Xiao G, Yu J, Ma J, Fan DP, Shao L. Latent Semantic Consensus for Deterministic Geometric Model Fitting. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:6139-6153. [PMID: 38478435 DOI: 10.1109/tpami.2024.3376731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]
Abstract
Estimating reliable geometric model parameters from the data with severe outliers is a fundamental and important task in computer vision. This paper attempts to sample high-quality subsets and select model instances to estimate parameters in the multi-structural data. To address this, we propose an effective method called Latent Semantic Consensus (LSC). The principle of LSC is to preserve the latent semantic consensus in both data points and model hypotheses. Specifically, LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses, respectively. Then, LSC explores the distributions of points in the two latent semantic spaces, to remove outliers, generate high-quality model hypotheses, and effectively estimate model instances. Finally, LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting, due to its deterministic fitting nature and efficiency. Compared with several state-of-the-art model fitting methods, our LSC achieves significant superiority for the performance of both accuracy and speed on synthetic data and real images.
Collapse
|
2
|
Zhang S, Ma J. ConvMatch: Rethinking Network Design for Two-View Correspondence Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:2920-2935. [PMID: 37983155 DOI: 10.1109/tpami.2023.3334515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Multilayer perceptron (MLP) has become the de facto backbone in two-view correspondence learning, for it can extract effective deep features from unordered correspondences individually. However, the problem of natively lacking context information limits its performance although many context-capturing modules are appended in the follow-up studies. In this paper, from a novel perspective, we design a correspondence learning network called ConvMatch that for the first time can leverage a convolutional neural network (CNN) as the backbone, inherently capable of context aggregation. Specifically, with the observation that sparse motion vectors and a dense motion field can be converted into each other with interpolating and sampling, we regularize the putative motion vectors by estimating the dense motion field implicitly, then rectify the errors caused by outliers in local areas with CNN, and finally obtain correct motion vectors from the rectified motion field. Moreover, we propose global information injection and bilateral convolution, to fit the overall spatial transformation better and accommodate the discontinuities of the motion field in case of large scene disparity. Extensive experiments reveal that ConvMatch consistently outperforms state-of-the-arts for relative pose estimation, homography estimation, and visual localization.
Collapse
|
3
|
Lin S, Chen X, Xiao G, Wang H, Huang F, Weng J. Multi-Stage Network With Geometric Semantic Attention for Two-View Correspondence Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:3031-3046. [PMID: 38656841 DOI: 10.1109/tip.2024.3391002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
The removal of outliers is crucial for establishing correspondence between two images. However, when the proportion of outliers reaches nearly 90%, the task becomes highly challenging. Existing methods face limitations in effectively utilizing geometric transformation consistency (GTC) information and incorporating geometric semantic neighboring information. To address these challenges, we propose a Multi-Stage Geometric Semantic Attention (MSGSA) network. The MSGSA network consists of three key modules: the multi-branch (MB) module, the GTC module, and the geometric semantic attention (GSA) module. The MB module, structured with a multi-branch design, facilitates diverse and robust spatial transformations. The GTC module captures transformation consistency information from the preceding stage. The GSA module categorizes input based on the prior stage's output, enabling efficient extraction of geometric semantic information through a graph-based representation and inter-category information interaction using Transformer. Extensive experiments on the YFCC100M and SUN3D datasets demonstrate that MSGSA outperforms current state-of-the-art methods in outlier removal and camera pose estimation, particularly in scenarios with a high prevalence of outliers. Source code is available at https://github.com/shuyuanlin.
Collapse
|
4
|
Zhang Y, An P, Li Z, Liu Q, Yang Y. See farther and more: a master-slave UAVs based synthetic optical aperture imaging system with wide and dynamic baseline. OPTICS EXPRESS 2024; 32:11346-11362. [PMID: 38570984 DOI: 10.1364/oe.520677] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 02/28/2024] [Indexed: 04/05/2024]
Abstract
An open challenge remained in designing an optical system to capture the aerial image with a wide field of view (FoV) and high resolution. The optical system of one camera from a single unmanned aerial vehicle (UAV) can hardly promise the FoV and resolution. The conventional swarm UAVs can form the camera array with a short or fixed baseline. They can capture the images with a wide FoV and high resolution, but the cost is the requirement of many UAVs. We aim to design a camera array with a wide and dynamic baseline to reduce the demand for UAVs to organize a synthetic optical aperture. In this thought, we propose a master-slave UAVs-based synthetic optical aperture imaging system with a wide and dynamic baseline. The system consists of one master UAV and multiple slave UAVs. Master and slave UAVs provide the global and local FoVs, respectively, and improve the efficiency of image acquisition. In such a system, fusing UAV images becomes a new challenge due to two factors: (i) the small FoV overlap of slave UAVs and (ii) the gap in resolution scale from slave to master UAV images. To deal with it, a coarse-to-fine stitching method is proposed to stitch up the multi-view images into one to obtain a wide FoV with high resolution. A video stabilization method has also been designed for the proposed imaging system. Challenges caused by wide and dynamic baselines can thus be solved by the above methods. Actual data experiments demonstrate that the proposed imaging system achieves high-quality imaging results.
Collapse
|
5
|
El Saer A, Grammatikopoulos L, Sfikas G, Karras G, Petsa E. A Novel Framework for Image Matching and Stitching for Moving Car Inspection under Illumination Challenges. SENSORS (BASEL, SWITZERLAND) 2024; 24:1083. [PMID: 38400240 PMCID: PMC10891783 DOI: 10.3390/s24041083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 02/02/2024] [Accepted: 02/05/2024] [Indexed: 02/25/2024]
Abstract
Vehicle exterior inspection is a critical operation for identifying defects and ensuring the overall safety and integrity of vehicles. Visual-based inspection of moving objects, such as vehicles within dynamic environments abounding with reflections, presents significant challenges, especially when time and accuracy are of paramount importance. Conventional exterior inspections of vehicles require substantial labor, which is both costly and prone to errors. Recent advancements in deep learning have reduced labor work by enabling the use of segmentation algorithms for defect detection and description based on simple RGB camera acquisitions. Nonetheless, these processes struggle with issues of image orientation leading to difficulties in accurately differentiating between detected defects. This results in numerous false positives and additional labor effort. Estimating image poses enables precise localization of vehicle damages within a unified 3D reference system, following initial detections in the 2D imagery. A primary challenge in this field is the extraction of distinctive features and the establishment of accurate correspondences between them, a task that typical image matching techniques struggle to address for highly reflective moving objects. In this study, we introduce an innovative end-to-end pipeline tailored for efficient image matching and stitching, specifically addressing the challenges posed by moving objects in static uncalibrated camera setups. Extracting features from moving objects with strong reflections presents significant difficulties, beyond the capabilities of current image matching algorithms. To tackle this, we introduce a novel filtering scheme that can be applied to every image matching process, provided that the input features are sufficient. A critical aspect of this module involves the exclusion of points located in the background, effectively distinguishing them from points that pertain to the vehicle itself. This is essential for accurate feature extraction and subsequent analysis. Finally, we generate a high-quality image mosaic by employing a series of sequential stereo-rectified pairs.
Collapse
Affiliation(s)
- Andreas El Saer
- Department of Surveying and Geoinformatics Engineering, University of West Attica, 12243 Athens, Greece; (A.E.S.); (G.S.); (E.P.)
| | - Lazaros Grammatikopoulos
- Department of Surveying and Geoinformatics Engineering, University of West Attica, 12243 Athens, Greece; (A.E.S.); (G.S.); (E.P.)
| | - Giorgos Sfikas
- Department of Surveying and Geoinformatics Engineering, University of West Attica, 12243 Athens, Greece; (A.E.S.); (G.S.); (E.P.)
| | - George Karras
- School of Rural, Surveying and Geoinformatics Engineering, National Technical University of Athens, 15780 Athens, Greece;
| | - Elli Petsa
- Department of Surveying and Geoinformatics Engineering, University of West Attica, 12243 Athens, Greece; (A.E.S.); (G.S.); (E.P.)
| |
Collapse
|
6
|
Huang Q, Xiang T, Zhao Z, Wu K, Li H, Cheng R, Zhang L, Cheng Z. Directional region-based feature point matching algorithm based on SURF. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA. A, OPTICS, IMAGE SCIENCE, AND VISION 2024; 41:157-164. [PMID: 38437328 DOI: 10.1364/josaa.501371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 12/13/2023] [Indexed: 03/06/2024]
Abstract
Feature point matching is one of the fundamental tasks in binocular vision. It directly affects the accuracy and quality of 3D reconstruction. This study proposes a directional region-based feature point matching algorithm based on the SURF algorithm to improve the accuracy of feature point matching. First, same-name points are selected as the matching reference points in the left and right images. Then, the SURF algorithm is used to extract feature points and construct the SURF feature point descriptors. During the matching process, the location relationship between the query feature point and the reference point in the left image is directed to determine the corresponding matching region in the right image. Then, the matching is completed within this region based on Euclidean distance. Finally, the grid-based motion statistics algorithm is used to eliminate mismatches. Experimental results show that the proposed algorithm can substantially improve the matching accuracy and the number of valid matched points, particularly in the presence of a large amount of noise and interference. It also exhibits good robustness and stability.
Collapse
|
7
|
Cobanaj M, Corti C, Dee EC, McCullum L, Boldrini L, Schlam I, Tolaney SM, Celi LA, Curigliano G, Criscitiello C. Advancing equitable and personalized cancer care: Novel applications and priorities of artificial intelligence for fairness and inclusivity in the patient care workflow. Eur J Cancer 2024; 198:113504. [PMID: 38141549 DOI: 10.1016/j.ejca.2023.113504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 12/13/2023] [Indexed: 12/25/2023]
Abstract
Patient care workflows are highly multimodal and intertwined: the intersection of data outputs provided from different disciplines and in different formats remains one of the main challenges of modern oncology. Artificial Intelligence (AI) has the potential to revolutionize the current clinical practice of oncology owing to advancements in digitalization, database expansion, computational technologies, and algorithmic innovations that facilitate discernment of complex relationships in multimodal data. Within oncology, radiation therapy (RT) represents an increasingly complex working procedure, involving many labor-intensive and operator-dependent tasks. In this context, AI has gained momentum as a powerful tool to standardize treatment performance and reduce inter-observer variability in a time-efficient manner. This review explores the hurdles associated with the development, implementation, and maintenance of AI platforms and highlights current measures in place to address them. In examining AI's role in oncology workflows, we underscore that a thorough and critical consideration of these challenges is the only way to ensure equitable and unbiased care delivery, ultimately serving patients' survival and quality of life.
Collapse
Affiliation(s)
- Marisa Cobanaj
- National Center for Radiation Research in Oncology, OncoRay, Helmholtz-Zentrum Dresden-Rossendorf, Dresden, Germany
| | - Chiara Corti
- Breast Oncology Program, Dana-Farber Brigham Cancer Center, Boston, MA, USA; Harvard Medical School, Boston, MA, USA; Division of New Drugs and Early Drug Development for Innovative Therapies, European Institute of Oncology, IRCCS, Milan, Italy; Department of Oncology and Hematology-Oncology (DIPO), University of Milan, Milan, Italy.
| | - Edward C Dee
- Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Lucas McCullum
- Department of Radiation Oncology, MD Anderson Cancer Center, Houston, TX, USA
| | - Laura Boldrini
- Division of New Drugs and Early Drug Development for Innovative Therapies, European Institute of Oncology, IRCCS, Milan, Italy; Department of Oncology and Hematology-Oncology (DIPO), University of Milan, Milan, Italy
| | - Ilana Schlam
- Department of Hematology and Oncology, Tufts Medical Center, Boston, MA, USA; Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Sara M Tolaney
- Breast Oncology Program, Dana-Farber Brigham Cancer Center, Boston, MA, USA; Harvard Medical School, Boston, MA, USA; Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Leo A Celi
- Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA; Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Giuseppe Curigliano
- Division of New Drugs and Early Drug Development for Innovative Therapies, European Institute of Oncology, IRCCS, Milan, Italy; Department of Oncology and Hematology-Oncology (DIPO), University of Milan, Milan, Italy
| | - Carmen Criscitiello
- Division of New Drugs and Early Drug Development for Innovative Therapies, European Institute of Oncology, IRCCS, Milan, Italy; Department of Oncology and Hematology-Oncology (DIPO), University of Milan, Milan, Italy
| |
Collapse
|
8
|
Wang AQ, Yu EM, Dalca AV, Sabuncu MR. A robust and interpretable deep learning framework for multi-modal registration via keypoints. Med Image Anal 2023; 90:102962. [PMID: 37769550 PMCID: PMC10591968 DOI: 10.1016/j.media.2023.102962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Revised: 08/24/2023] [Accepted: 09/07/2023] [Indexed: 10/03/2023]
Abstract
We present KeyMorph, a deep learning-based image registration framework that relies on automatically detecting corresponding keypoints. State-of-the-art deep learning methods for registration often are not robust to large misalignments, are not interpretable, and do not incorporate the symmetries of the problem. In addition, most models produce only a single prediction at test-time. Our core insight which addresses these shortcomings is that corresponding keypoints between images can be used to obtain the optimal transformation via a differentiable closed-form expression. We use this observation to drive the end-to-end learning of keypoints tailored for the registration task, and without knowledge of ground-truth keypoints. This framework not only leads to substantially more robust registration but also yields better interpretability, since the keypoints reveal which parts of the image are driving the final alignment. Moreover, KeyMorph can be designed to be equivariant under image translations and/or symmetric with respect to the input image ordering. Finally, we show how multiple deformation fields can be computed efficiently and in closed-form at test time corresponding to different transformation variants. We demonstrate the proposed framework in solving 3D affine and spline-based registration of multi-modal brain MRI scans. In particular, we show registration accuracy that surpasses current state-of-the-art methods, especially in the context of large displacements. Our code is available at https://github.com/alanqrwang/keymorph.
Collapse
Affiliation(s)
- Alan Q Wang
- School of Electrical and Computer Engineering, Cornell University and Cornell Tech, New York, NY 10044, USA; Department of Radiology, Weill Cornell Medical School, New York, NY 10065, USA.
| | - Evan M Yu
- Iterative Scopes, Cambridge, MA 02139, USA
| | - Adrian V Dalca
- Computer Science and Artificial Intelligence Lab at the Massachusetts Institute of Technology, Cambridge, MA 02139, USA; A.A. Martinos Center for Biomedical Imaging at the Massachusetts General Hospital, Charlestown, MA 02129, USA
| | - Mert R Sabuncu
- School of Electrical and Computer Engineering, Cornell University and Cornell Tech, New York, NY 10044, USA; Department of Radiology, Weill Cornell Medical School, New York, NY 10065, USA
| |
Collapse
|
9
|
Ma X, He J, Liu X, Liu Q, Chen G, Yuan B, Li C, Xia Y. Hierarchical cumulative network for unsupervised medical image registration. Comput Biol Med 2023; 167:107598. [PMID: 37913614 DOI: 10.1016/j.compbiomed.2023.107598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 09/23/2023] [Accepted: 10/17/2023] [Indexed: 11/03/2023]
Abstract
Unsupervised deep learning techniques have gained increasing popularity in deformable medical image registration However, existing methods usually overlook the optimal similarity position between moving and fixed images To tackle this issue, we propose a novel hierarchical cumulative network (HCN), which explicitly considers the optimal similarity position with an effective Bidirectional Asymmetric Registration Module (BARM). The BARM simultaneously learns two asymmetric displacement vector fields (DVFs) to optimally warp both moving images and fixed images to their optimal similar shape along the geodesic path. Furthermore, we incorporate the BARM into a Laplacian pyramid network with hierarchical recursion, in which the moving image at the lowest level of the pyramid is warped successively for aligning to the fixed image at the lowest level of the pyramid to capture multiple DVFs. We then accumulate these DVFs and up-sample them to warp the moving images at higher levels of the pyramid to align to the fixed image of the top level. The entire system is end-to-end and jointly trained in an unsupervised manner. Extensive experiments were conducted on two public 3D Brain MRI datasets to demonstrate that our HCN outperforms both the traditional and state-of-the-art registration methods. To further evaluate the performance of our HCN, we tested it on the validation set of the MICCAI Learn2Reg 2021 challenge. Additionally, a cross-dataset evaluation was conducted to assess the generalization of our HCN. Experimental results showed that our HCN is an effective deformable registration method and achieves excellent generalization performance.
Collapse
Affiliation(s)
- Xinke Ma
- National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China.
| | - Jiang He
- Huiying Medical Technology Co., Ltd., Room A206, B2, Dongsheng Science and Technology Park, Haidian District, Beijing 100192, China.
| | - Xing Liu
- National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China.
| | - Qin Liu
- National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China.
| | - Geng Chen
- National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China.
| | - Bo Yuan
- Sichuan Provincial Health Information Center (Sichuan Provincial Health and Medical Big Data Center), Chengdu 610041, China.
| | - Changyang Li
- Sydney Polytechnic Institute, NSW 2000, Australia.
| | - Yong Xia
- National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China.
| |
Collapse
|
10
|
Fan Y, Mao S, Li M, Kang J, Li B. LMFD: lightweight multi-feature descriptors for image stitching. Sci Rep 2023; 13:21162. [PMID: 38036564 PMCID: PMC10689729 DOI: 10.1038/s41598-023-48432-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 11/27/2023] [Indexed: 12/02/2023] Open
Abstract
Image stitching is a fundamental pillar of computer vision, and its effectiveness hinges significantly on the quality of the feature descriptors. However, the existing feature descriptors face several challenges, including inadequate robustness to noise or rotational transformations and limited adaptability during hardware deployment. To address these limitations, this paper proposes a set of feature descriptors for image stitching named Lightweight Multi-Feature Descriptors (LMFD). Based on the extensive extraction of gradients, means, and global information surrounding the feature points, feature descriptors are generated through various combinations to enhance the image stitching process. This endows the algorithm with formidable rotational invariance and noise resistance, thereby improving its accuracy and reliability. Furthermore, the feature descriptors take the form of binary matrices consisting of 0s and 1s, not only facilitating more efficient hardware deployment but also enhancing computational efficiency. The utilization of binary matrices significantly reduces the computational complexity of the algorithm while preserving its efficacy. To validate the effectiveness of LMFD, rigorous experimentation was conducted on the Hpatches and 2D-HeLa datasets. The results demonstrate that LMFD outperforms state-of-the-art image matching algorithms in terms of accuracy. This empirical evidence solidifies the superiority of LMFD and substantiates its potential for practical applications in various domains.
Collapse
Affiliation(s)
- Yingbo Fan
- Institute of Remote Sensing and Geographic Information Systems, Peking University, No.5 Summer Palace Road, Beijing, 100000, China
| | - Shanjun Mao
- Institute of Remote Sensing and Geographic Information Systems, Peking University, No.5 Summer Palace Road, Beijing, 100000, China.
| | - Mei Li
- Institute of Remote Sensing and Geographic Information Systems, Peking University, No.5 Summer Palace Road, Beijing, 100000, China
| | - Jitong Kang
- Institute of Remote Sensing and Geographic Information Systems, Peking University, No.5 Summer Palace Road, Beijing, 100000, China
| | - Ben Li
- Institute of Remote Sensing and Geographic Information Systems, Peking University, No.5 Summer Palace Road, Beijing, 100000, China
| |
Collapse
|
11
|
Lomas-Barrie V, Suarez-Espinoza M, Hernandez-Chavez G, Neme A. A New Method for Classifying Scenes for Simultaneous Localization and Mapping Using the Boundary Object Function Descriptor on RGB-D Points. SENSORS (BASEL, SWITZERLAND) 2023; 23:8836. [PMID: 37960535 PMCID: PMC10648618 DOI: 10.3390/s23218836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 10/19/2023] [Accepted: 10/23/2023] [Indexed: 11/15/2023]
Abstract
Scene classification in autonomous navigation is a highly complex task due to variations, such as light conditions and dynamic objects, in the inspected scenes; it is also a challenge for small-factor computers to run modern and highly demanding algorithms. In this contribution, we introduce a novel method for classifying scenes in simultaneous localization and mapping (SLAM) using the boundary object function (BOF) descriptor on RGB-D points. Our method aims to reduce complexity with almost no performance cost. All the BOF-based descriptors from each object in a scene are combined to define the scene class. Instead of traditional image classification methods such as ORB or SIFT, we use the BOF descriptor to classify scenes. Through an RGB-D camera, we capture points and adjust them onto layers than are perpendicular to the camera plane. From each plane, we extract the boundaries of objects such as furniture, ceilings, walls, or doors. The extracted features compose a bag of visual words classified by a support vector machine. The proposed method achieves almost the same accuracy in scene classification as a SIFT-based algorithm and is 2.38× faster. The experimental results demonstrate the effectiveness of the proposed method in terms of accuracy and robustness for the 7-Scenes and SUNRGBD datasets.
Collapse
Affiliation(s)
- Victor Lomas-Barrie
- Instituto de Investigaciones en Matematicas Aplicadas y en Sistemas, Universidad Nacional Autonoma de Mexico, Mexico City 04510, Mexico;
| | - Mario Suarez-Espinoza
- Facultad de Ingeniería, Universidad Nacional Autonoma de Mexico, Mexico City 04510, Mexico;
| | | | - Antonio Neme
- Instituto de Investigaciones en Matematicas Aplicadas y en Sistemas, Universidad Nacional Autonoma de Mexico, Mexico City 04510, Mexico;
| |
Collapse
|
12
|
Magnier B, Hayat K. Revisiting Mehrotra and Nichani's Corner Detection Method for Improvement with Truncated Anisotropic Gaussian Filtering. SENSORS (BASEL, SWITZERLAND) 2023; 23:8653. [PMID: 37896745 PMCID: PMC10611396 DOI: 10.3390/s23208653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 10/03/2023] [Accepted: 10/18/2023] [Indexed: 10/29/2023]
Abstract
In the early 1990s, Mehrotra and Nichani developed a filtering-based corner detection method, which, though conceptually intriguing, suffered from limited reliability, leading to minimal references in the literature. Despite its underappreciation, the core concept of this method, rooted in the half-edge concept and directional truncated first derivative of Gaussian, holds significant promise. This article presents a comprehensive assessment of the enhanced corner detection algorithm, combining both qualitative and quantitative evaluations. We thoroughly explore the strengths, limitations, and overall effectiveness of our approach by incorporating visual examples and conducting evaluations. Through experiments conducted on both synthetic and real images, we demonstrate the efficiency and reliability of the proposed algorithm. Collectively, our experimental assessments substantiate that our modifications have transformed the method into one that outperforms established benchmark techniques. Due to its ease of implementation, our improved corner detection process has the potential to become a valuable reference for the computer vision community when dealing with corner detection algorithms. This article thus highlights the quantitative achievements of our refined corner detection algorithm, building upon the groundwork laid by Mehrotra and Nichani, and offers valuable insights for the computer vision community seeking robust corner detection solutions.
Collapse
Affiliation(s)
- Baptiste Magnier
- Euromov Digital Health in Motion, Univ Montpellier, IMT Mines Ales, Ales, France
| | - Khizar Hayat
- College of Arts and Sciences, University of Nizwa, Nizwa 616, Oman;
| |
Collapse
|
13
|
Livieris IE, Pintelas E, Kiriakidou N, Pintelas P. Explainable Image Similarity: Integrating Siamese Networks and Grad-CAM. J Imaging 2023; 9:224. [PMID: 37888331 PMCID: PMC10606999 DOI: 10.3390/jimaging9100224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 10/03/2023] [Accepted: 10/12/2023] [Indexed: 10/28/2023] Open
Abstract
With the proliferation of image-based applications in various domains, the need for accurate and interpretable image similarity measures has become increasingly critical. Existing image similarity models often lack transparency, making it challenging to understand the reasons why two images are considered similar. In this paper, we propose the concept of explainable image similarity, where the goal is the development of an approach, which is capable of providing similarity scores along with visual factual and counterfactual explanations. Along this line, we present a new framework, which integrates Siamese Networks and Grad-CAM for providing explainable image similarity and discuss the potential benefits and challenges of adopting this approach. In addition, we provide a comprehensive discussion about factual and counterfactual explanations provided by the proposed framework for assisting decision making. The proposed approach has the potential to enhance the interpretability, trustworthiness and user acceptance of image-based systems in real-world image similarity applications.
Collapse
Affiliation(s)
- Ioannis E. Livieris
- Department of Statistics & Insurance, University of Piraeus, GR 185-34 Piraeus, Greece
| | - Emmanuel Pintelas
- Department of Mathematics, University of Patras, GR 265-00 Patras, Greece; (E.P.); (P.P.)
| | - Niki Kiriakidou
- Department of Informatics and Telematics, Harokopio University of Athens, GR 177-78 Athens, Greece;
| | - Panagiotis Pintelas
- Department of Mathematics, University of Patras, GR 265-00 Patras, Greece; (E.P.); (P.P.)
| |
Collapse
|
14
|
Li Z, Ma J, Xiao G. Density-Guided Incremental Dominant Instance Exploration for Two-View Geometric Model Fitting. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:5408-5422. [PMID: 37773911 DOI: 10.1109/tip.2023.3318945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/01/2023]
Abstract
Existing two-view multi-model fitting methods typically follow a two-step manner, i.e., model generation and selection, without considering their interaction. Therefore, in the first step, these methods have to generate a considerable number of instances in order to cover all desired ones, which not only offers no guarantees, but also introduces unnecessary expensive calculations. To address this challenge, this study presents a new algorithm, termed as D2Fitting, that incrementally explores dominant instances. Particularly, rather than viewing model generation and selection as two disjoint parts, D2Fitting fully considers their interaction, and thus performs these two subroutines alternatively under a simple yet effective optimization framework. This design can avoid generating too many redundant instances, thus reducing computational overhead and allowing the proposed D2Fitting being real-time. Meanwhile, we further design a novel density-guided sampler to sample high-quality minimal subsets during the model generation process, so as to fully exploit the spatial distribution of the input data. Also, to mitigate the influence of noise on the subsets sampled by the proposed sampler, a global-residual optimization strategy is investigated for the minimal subset refinement. With all the ingredients mentioned above, the proposed D2Fitting can accurately estimate the number and parameters of geometric models and efficiently segment the input data simultaneously. Extensive experiments on several public datasets demonstrate the significant superiority of D2Fitting over several state-of-the-arts.
Collapse
|
15
|
Chen Z, Sun K, Yang F, Guo L, Tao W. SC 2-PCR++: Rethinking the Generation and Selection for Efficient and Robust Point Cloud Registration. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:12358-12376. [PMID: 37134034 DOI: 10.1109/tpami.2023.3272557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Outlier removal is a critical part of feature-based point cloud registration. In this article, we revisit the model generation and selection of the classic RANSAC approach for fast and robust point cloud registration. For the model generation, we propose a second-order spatial compatibility (SC 2) measure to compute the similarity between correspondences. It takes into account global compatibility instead of local consistency, allowing for more distinctive clustering between inliers and outliers at an early stage. The proposed measure promises to find a certain number of outlier-free consensus sets using fewer samplings, making the model generation more efficient. For the model selection, we propose a new Feature and Spatial consistency constrained Truncated Chamfer Distance (FS-TCD) metric for evaluating the generated models. It considers the alignment quality, the feature matching properness, and the spatial consistency constraint simultaneously, enabling the correct model to be selected even when the inlier rate of the putative correspondence set is extremely low. Extensive experiments are carried out to investigate the performance of our method. In addition, we also experimentally prove that the proposed SC 2 measure and the FS-TCD metric are general and can be easily plugged into deep learning based frameworks.
Collapse
|
16
|
Tommasino C, Merolla F, Russo C, Staibano S, Rinaldi AM. Histopathological Image Deep Feature Representation for CBIR in Smart PACS. J Digit Imaging 2023; 36:2194-2209. [PMID: 37296349 PMCID: PMC10501985 DOI: 10.1007/s10278-023-00832-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 03/16/2023] [Accepted: 04/12/2023] [Indexed: 06/12/2023] Open
Abstract
Pathological Anatomy is moving toward computerizing processes mainly due to the extensive digitization of histology slides that resulted in the availability of many Whole Slide Images (WSIs). Their use is essential, especially in cancer diagnosis and research, and raises the pressing need for increasingly influential information archiving and retrieval systems. Picture Archiving and Communication Systems (PACSs) represent an actual possibility to archive and organize this growing amount of data. The design and implementation of a robust and accurate methodology for querying them in the pathology domain using a novel approach are mandatory. In particular, the Content-Based Image Retrieval (CBIR) methodology can be involved in the PACSs using a query-by-example task. In this context, one of many crucial points of CBIR concerns the representation of images as feature vectors, and the accuracy of retrieval mainly depends on feature extraction. Thus, our study explored different representations of WSI patches by features extracted from pre-trained Convolution Neural Networks (CNNs). In order to perform a helpful comparison, we evaluated features extracted from different layers of state-of-the-art CNNs using different dimensionality reduction techniques. Furthermore, we provided a qualitative analysis of obtained results. The evaluation showed encouraging results for our proposed framework.
Collapse
Affiliation(s)
- Cristian Tommasino
- Department of Electrical Engineering and Information Technology, University of Napoli Federico II, Via Claudio 21, Naples, 80125 Italy
| | - Francesco Merolla
- Department of Advanced Biomedical Sciences, Pathology Section, University of Naples Federico II, Naples, 80131 Italy
| | - Cristiano Russo
- Department of Electrical Engineering and Information Technology, University of Napoli Federico II, Via Claudio 21, Naples, 80125 Italy
| | - Stefania Staibano
- Department of Medicine and Health Sciences V. Tiberio, University of Molise, Campobasso, 86100 Italy
| | - Antonio Maria Rinaldi
- Department of Electrical Engineering and Information Technology, University of Napoli Federico II, Via Claudio 21, Naples, 80125 Italy
| |
Collapse
|
17
|
Xu H, Yuan J, Ma J. MURF: Mutually Reinforcing Multi-Modal Image Registration and Fusion. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:12148-12166. [PMID: 37285256 DOI: 10.1109/tpami.2023.3283682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Existing image fusion methods are typically limited to aligned source images and have to "tolerate" parallaxes when images are unaligned. Simultaneously, the large variances between different modalities pose a significant challenge for multi-modal image registration. This study proposes a novel method called MURF, where for the first time, image registration and fusion are mutually reinforced rather than being treated as separate issues. MURF leverages three modules: shared information extraction module (SIEM), multi-scale coarse registration module (MCRM), and fine registration and fusion module (F2M). The registration is carried out in a coarse-to-fine manner. During coarse registration, SIEM first transforms multi-modal images into mono-modal shared information to eliminate the modal variances. Then, MCRM progressively corrects the global rigid parallaxes. Subsequently, fine registration to repair local non-rigid offsets and image fusion are uniformly implemented in F2M. The fused image provides feedback to improve registration accuracy, and the improved registration result further improves the fusion result. For image fusion, rather than solely preserving the original source information in existing methods, we attempt to incorporate texture enhancement into image fusion. We test on four types of multi-modal data (RGB-IR, RGB-NIR, PET-MRI, and CT-MRI). Extensive registration and fusion results validate the superiority and universality of MURF.
Collapse
|
18
|
Zhou C, Wang H, Zhou S, Yu Z, Bandara D, Bu J. Hierarchical Knowledge Propagation and Distillation for Few-Shot Learning. Neural Netw 2023; 167:615-625. [PMID: 37713767 DOI: 10.1016/j.neunet.2023.08.040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Revised: 07/26/2023] [Accepted: 08/21/2023] [Indexed: 09/17/2023]
Abstract
Recent research efforts on Few-Shot Learning (FSL) have achieved extensive progress. However, the existing efforts primarily focus on the transductive setting of FSL, which is heavily challenged by the limited quantity of the unlabeled query set. Although a few inductive-based FSL methods have been studied, most of them emphasize learning superb feature extraction networks. As a result, they may ignore the relations between sample-level and class-level representations, which are particularly crucial when labeled samples are scarce. This paper proposes an inductive FSL framework that leverages the Hierarchical Knowledge Propagation and Distillation, named HKPD. To learn more discriminative sample-level representations, HKPD first constructs a sample-level information propagation module that explores pairwise sample relations. Subsequently, a class-level information propagation module is designed to obtain and update the class-level information. Moreover, a self-distillation module is adopted to further improve the learned representations by propagating the obtained knowledge across this hierarchical architecture. Extensive experiments conducted on the commonly used few-shot benchmark datasets demonstrate the superiority of the proposed HKPD method, which outperforms the current state-of-the-art methods.
Collapse
Affiliation(s)
- Chunpeng Zhou
- Zhejiang Provincial Key Laboratory of Service Robot, College of Computer Science, Zhejiang University, Hangzhou, 310000, China
| | - Haishuai Wang
- Zhejiang Provincial Key Laboratory of Service Robot, College of Computer Science, Zhejiang University, Hangzhou, 310000, China.
| | - Sheng Zhou
- Zhejiang Provincial Key Laboratory of Service Robot, College of Computer Science, Zhejiang University, Hangzhou, 310000, China
| | - Zhi Yu
- Zhejiang Provincial Key Laboratory of Service Robot, College of Computer Science, Zhejiang University, Hangzhou, 310000, China
| | - Danushka Bandara
- Department of Computer Science and Engineering, Fairfield University, Fairfield, CT, 06824, USA
| | - Jiajun Bu
- Zhejiang Provincial Key Laboratory of Service Robot, College of Computer Science, Zhejiang University, Hangzhou, 310000, China.
| |
Collapse
|
19
|
Gan Z, Sun W, Liao K, Yang X. Probabilistic Modeling for Image Registration Using Radial Basis Functions: Application to Cardiac Motion Estimation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:7324-7338. [PMID: 35073271 DOI: 10.1109/tnnls.2022.3141119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Cardiovascular diseases (CVDs) are the leading cause of death, affecting the cardiac dynamics over the cardiac cycle. Estimation of cardiac motion plays an essential role in many medical clinical tasks. This article proposes a probabilistic framework for image registration using compact support radial basis functions (CSRBFs) to estimate cardiac motion. A variational inference-based generative model with convolutional neural networks (CNNs) is proposed to learn the probabilistic coefficients of CSRBFs used in image deformation. We designed two networks to estimate the deformation coefficients of CSRBFs: the first one solves the spatial transformation using given control points, and the second one models the transformation using drifting control points. The given-point-based network estimates the probabilistic coefficients of control points. In contrast, the drifting-point-based model predicts the probabilistic coefficients and spatial distribution of control points simultaneously. To regularize these coefficients, we derive the bending energy (BE) in the variational bound by defining the covariance of coefficients. The proposed framework has been evaluated on the cardiac motion estimation and the calculation of the myocardial strain. In the experiments, 1409 slice pairs of end-diastolic (ED) and end-systolic (ES) phase in 4-D cardiac magnetic resonance (MR) images selected from three public datasets are employed to evaluate our networks. The experimental results show that our framework outperforms the state-of-the-art registration methods concerning the deformation smoothness and registration accuracy.
Collapse
|
20
|
Ma X, Cui H, Li S, Yang Y, Xia Y. Deformable medical image registration with global-local transformation network and region similarity constraint. Comput Med Imaging Graph 2023; 108:102263. [PMID: 37487363 DOI: 10.1016/j.compmedimag.2023.102263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2022] [Revised: 05/04/2023] [Accepted: 06/07/2023] [Indexed: 07/26/2023]
Abstract
Deformable medical image registration can achieve fast and accurate alignment between two images, enabling medical professionals to analyze images of different subjects in a unified anatomical space. As such, it plays an important role in many medical image studies. Current deep learning (DL)-based approaches for image registration directly learn spatial transformation from one image to another, relying on a convolutional neural network and ground truth or similarity metrics. However, these methods only use a global similarity energy function to evaluate the similarity of a pair of images, which ignores the similarity of regions of interest (ROIs) within the images. This can limit the accuracy of the image registration and affect the analysis of specific ROIs. Additionally, DL-based methods often estimate global spatial transformations of images directly, without considering local spatial transformations of ROIs within the images. To address this issue, we propose a novel global-local transformation network with a region similarity constraint that maximizes the similarity of ROIs within the images and estimates both global and local spatial transformations simultaneously. Experiments conducted on four public 3D MRI datasets demonstrate that the proposed method achieves the highest registration performance in terms of accuracy and generalization compared to other state-of-the-art methods.
Collapse
Affiliation(s)
- Xinke Ma
- National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
| | - Hengfei Cui
- National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
| | - Shuoyan Li
- National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
| | - Yibo Yang
- King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| | - Yong Xia
- National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China.
| |
Collapse
|
21
|
Guo H, Xu X, Song X, Xu S, Chao H, Myers J, Turkbey B, Pinto PA, Wood BJ, Yan P. Ultrasound Frame-to-Volume Registration via Deep Learning for Interventional Guidance. IEEE TRANSACTIONS ON ULTRASONICS, FERROELECTRICS, AND FREQUENCY CONTROL 2023; 70:1016-1025. [PMID: 37015418 PMCID: PMC10502768 DOI: 10.1109/tuffc.2022.3229903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Fusing intraoperative 2-D ultrasound (US) frames with preoperative 3-D magnetic resonance (MR) images for guiding interventions has become the clinical gold standard in image-guided prostate cancer biopsy. However, developing an automatic image registration system for this application is challenging because of the modality gap between US/MR and the dimensionality gap between 2-D/3-D data. To overcome these challenges, we propose a novel US frame-to-volume registration (FVReg) pipeline to bridge the dimensionality gap between 2-D US frames and 3-D US volume. The developed pipeline is implemented using deep neural networks, which are fully automatic without requiring external tracking devices. The framework consists of three major components, including one) a frame-to-frame registration network (Frame2Frame) that estimates the current frame's 3-D spatial position based on previous video context, two) a frame-to-slice correction network (Frame2Slice) adjusting the estimated frame position using the 3-D US volumetric information, and three) a similarity filtering (SF) mechanism selecting the frame with the highest image similarity with the query frame. We validated our method on a clinical dataset with 618 subjects and tested its potential on real-time 2-D-US to 3-D-MR fusion navigation tasks. The proposed FVReg achieved an average target navigation error of 1.93 mm at 5-14 fps. Our source code is publicly available at https://github.com/DIAL-RPI/Frame-to-Volume-Registration.
Collapse
|
22
|
Zhang J, Jiao L, Ma W, Liu F, Liu X, Li L, Zhu H. RDLNet: A Regularized Descriptor Learning Network. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:5669-5681. [PMID: 34878982 DOI: 10.1109/tnnls.2021.3130655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Local image descriptor learning has been instrumental in various computer vision tasks. Recent innovations lie with similarity measurement of descriptor vectors with metric learning for randomly selected Siamese or triplet patches. Local image descriptor learning focuses more on hard samples since easy samples do not contribute much to optimization. However, few studies focus on hard samples of image patches from the perspective of loss functions and design appropriate learning algorithms to obtain a more compact descriptor representation. This article proposes a regularized descriptor learning network (RDLNet) that makes the network focus on the learning of hard samples and compact descriptor with triplet networks. A novel hard sample mining strategy is designed to select the hardest negative samples in mini-batch. Then batch margin loss concerned with hard samples is adopted to optimize the distance of extreme cases. Finally, for a more stable network and preventing network collapsing, orthogonal regularization is designed to constrain convolutional kernels and obtain rich deep features. RDLNet provides a compact discriminative low-dimensional representation and can be embedded in other pipelines easily. This article gives extensive experimental results for large benchmarks in multiple scenarios and generalization in matching applications with significant improvements.
Collapse
|
23
|
Li J, Zhou YQ, Zhang QY. Metric networks for enhanced perception of non-local semantic information. Front Neurorobot 2023; 17:1234129. [PMID: 37622128 PMCID: PMC10445135 DOI: 10.3389/fnbot.2023.1234129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2023] [Accepted: 07/21/2023] [Indexed: 08/26/2023] Open
Abstract
Introduction Metric learning, as a fundamental research direction in the field of computer vision, has played a crucial role in image matching. Traditional metric learning methods aim at constructing two-branch siamese neural networks to address the challenge of image matching, but they often overlook to cross-source and cross-view scenarios. Methods In this article, a multi-branch metric learning model is proposed to address these limitations. The main contributions of this work are as follows: Firstly, we design a multi-branch siamese network model that enhances measurement reliability through information compensation among data points. Secondly, we construct a non-local information perception and fusion model, which accurately distinguishes positive and negative samples by fusing information at different scales. Thirdly, we enhance the model by integrating semantic information and establish an information consistency mapping between multiple branches, thereby improving the robustness in cross-source and cross-view scenarios. Results Experimental tests which demonstrate the effectiveness of the proposed method are carried out under various conditions, including homologous, heterogeneous, multi-view, and crossview scenarios. Compared to the state-of-the-art comparison algorithms, our proposed algorithm achieves an improvement of ~1, 2, 1, and 1% in terms of similarity measurement Recall@10, respectively, under these four conditions. Discussion In addition, our work provides an idea for improving the crossscene application ability of UAV positioning and navigation algorithm.
Collapse
Affiliation(s)
| | - Yu-qian Zhou
- College of Applied Mathematics, Chengdu University of Information Technology, Chengdu, Sichuan, China
| | | |
Collapse
|
24
|
Xu M, Cao L, Lu D, Hu Z, Yue Y. Application of Swarm Intelligence Optimization Algorithms in Image Processing: A Comprehensive Review of Analysis, Synthesis, and Optimization. Biomimetics (Basel) 2023; 8:235. [PMID: 37366829 DOI: 10.3390/biomimetics8020235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 05/27/2023] [Accepted: 06/01/2023] [Indexed: 06/28/2023] Open
Abstract
Image processing technology has always been a hot and difficult topic in the field of artificial intelligence. With the rise and development of machine learning and deep learning methods, swarm intelligence algorithms have become a hot research direction, and combining image processing technology with swarm intelligence algorithms has become a new and effective improvement method. Swarm intelligence algorithm refers to an intelligent computing method formed by simulating the evolutionary laws, behavior characteristics, and thinking patterns of insects, birds, natural phenomena, and other biological populations. It has efficient and parallel global optimization capabilities and strong optimization performance. In this paper, the ant colony algorithm, particle swarm optimization algorithm, sparrow search algorithm, bat algorithm, thimble colony algorithm, and other swarm intelligent optimization algorithms are deeply studied. The model, features, improvement strategies, and application fields of the algorithm in image processing, such as image segmentation, image matching, image classification, image feature extraction, and image edge detection, are comprehensively reviewed. The theoretical research, improvement strategies, and application research of image processing are comprehensively analyzed and compared. Combined with the current literature, the improvement methods of the above algorithms and the comprehensive improvement and application of image processing technology are analyzed and summarized. The representative algorithms of the swarm intelligence algorithm combined with image segmentation technology are extracted for list analysis and summary. Then, the unified framework, common characteristics, different differences of the swarm intelligence algorithm are summarized, existing problems are raised, and finally, the future trend is projected.
Collapse
Affiliation(s)
- Minghai Xu
- School of Intelligent Manufacturing and Electronic Engineering, Wenzhou University of Technology, Wenzhou 325035, China
| | - Li Cao
- School of Intelligent Manufacturing and Electronic Engineering, Wenzhou University of Technology, Wenzhou 325035, China
| | - Dongwan Lu
- Intelligent Information Systems Institute, Wenzhou University, Wenzhou 325035, China
| | - Zhongyi Hu
- Intelligent Information Systems Institute, Wenzhou University, Wenzhou 325035, China
| | - Yinggao Yue
- School of Intelligent Manufacturing and Electronic Engineering, Wenzhou University of Technology, Wenzhou 325035, China
- Intelligent Information Systems Institute, Wenzhou University, Wenzhou 325035, China
| |
Collapse
|
25
|
Xu T, Yang X, Fu Z, Jin G, Chen W, Huang M, Lu G. Star map matching method for optical circular rotation imaging based on graph neural networks. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA. A, OPTICS, IMAGE SCIENCE, AND VISION 2023; 40:1191-1200. [PMID: 37706772 DOI: 10.1364/josaa.486401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Accepted: 05/03/2023] [Indexed: 09/15/2023]
Abstract
This paper focuses on a dynamic star image acquisition and matching method for space situational awareness, which can quickly search for widely distributed resident space objects. First, the optical circular rotation imaging method performed by a single space camera is proposed to obtain a series of star images. And then, the image matching method based on graph neural networks is proposed for generating a wide observation star image. Experiment results show that compared with baseline matching algorithms, the matching accuracy and matching precision of the proposed algorithm are improved significantly.
Collapse
|
26
|
Xie X, Xia F, Wu Y, Liu S, Yan K, Xu H, Ji Z. A Novel Feature Selection Strategy Based on Salp Swarm Algorithm for Plant Disease Detection. PLANT PHENOMICS (WASHINGTON, D.C.) 2023; 5:0039. [PMID: 37228513 PMCID: PMC10204742 DOI: 10.34133/plantphenomics.0039] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 02/28/2023] [Indexed: 05/27/2023]
Abstract
Deep learning has been widely used for plant disease recognition in smart agriculture and has proven to be a powerful tool for image classification and pattern recognition. However, it has limited interpretability for deep features. With the transfer of expert knowledge, handcrafted features provide a new way for personalized diagnosis of plant diseases. However, irrelevant and redundant features lead to high dimensionality. In this study, we proposed a swarm intelligence algorithm for feature selection [salp swarm algorithm for feature selection (SSAFS)] in image-based plant disease detection. SSAFS is employed to determine the ideal combination of handcrafted features to maximize classification success while minimizing the number of features. To verify the effectiveness of the developed SSAFS algorithm, we conducted experimental studies using SSAFS and 5 metaheuristic algorithms. Several evaluation metrics were used to evaluate and analyze the performance of these methods on 4 datasets from the UCI machine learning repository and 6 plant phenomics datasets from PlantVillage. Experimental results and statistical analyses validated the outstanding performance of SSAFS compared to existing state-of-the-art algorithms, confirming the superiority of SSAFS in exploring the feature space and identifying the most valuable features for diseased plant image classification. This computational tool will allow us to explore an optimal combination of handcrafted features to improve plant disease recognition accuracy and processing time.
Collapse
Affiliation(s)
- Xiaojun Xie
- College of Artificial Intelligence, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
- Center for Data Science and Intelligent Computing, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Fei Xia
- College of Artificial Intelligence, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Yufeng Wu
- State Key Laboratory for Crop Genetics and Germplasm Enhancement, Bioinformatics Center, Academy for Advanced Interdisciplinary Studies, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Shouyang Liu
- Academy for Advanced Interdisciplinary Studies, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Ke Yan
- Department of the Built Environment, College of Design and Engineering, National University of Singapore, 4 Architecture Drive, Singapore 117566, Singapore
| | - Huanliang Xu
- College of Artificial Intelligence, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Zhiwei Ji
- College of Artificial Intelligence, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
- Center for Data Science and Intelligent Computing, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| |
Collapse
|
27
|
Zhao B, Zhang K, Liu P, Chen Y. Large-scale time-lapse scanning electron microscopy image mosaic using a smooth stitching strategy. Microsc Res Tech 2023. [PMID: 37119500 DOI: 10.1002/jemt.24334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 03/23/2023] [Accepted: 04/15/2023] [Indexed: 05/01/2023]
Abstract
Due to the trade-off between the field of view and resolution of various microscopes, obtaining a wide-view panoramic image through high-resolution image tiles is frequently encountered and demanded in numerous applications. Here, we propose an automatic image mosaic strategy for sequential 2D time-lapse scanning electron microscopy (SEM) images. This method can accurately compute pairwise translations among serial image tiles with indeterminate overlapping areas. The detection and matching of feature points are limited by geographical coordinates, thus avoiding accidental mismatching. Moreover, the nonlinear deformation of the mosaic part is also taken into account. A smooth stitching field is utilized to gradually transform the perspective transformation in overlapping regions into the linear transformation in non-overlapping regions. Experimental results demonstrate that better image stitching accuracy can be achieved compared with some other image mosaic algorithms. Such a method has potential applications in high-resolution large-area analysis using serial microscopy images. RESEARCH HIGHLIGHTS: An automatic image mosaic strategy for processing sequential scanning electron microscopy images is proposed. A smooth stitching field is applied in the image mosaic. Improved stitching accuracy is achieved compared with other conventional mosaic methods.
Collapse
Affiliation(s)
- Binglu Zhao
- Department of Precision Machinery and Precision Instrumentation, University of Science and Technology of China, Hefei, China
- Key Laboratory of Precision Scientific Instrumentation of Anhui Higher Education Institutes, University of Science and Technology of China, Hefei, China
| | - Kaidi Zhang
- Department of Precision Machinery and Precision Instrumentation, University of Science and Technology of China, Hefei, China
- Key Laboratory of Precision Scientific Instrumentation of Anhui Higher Education Institutes, University of Science and Technology of China, Hefei, China
| | - Peng Liu
- Department of Precision Machinery and Precision Instrumentation, University of Science and Technology of China, Hefei, China
- Key Laboratory of Precision Scientific Instrumentation of Anhui Higher Education Institutes, University of Science and Technology of China, Hefei, China
| | - Yuhang Chen
- Department of Precision Machinery and Precision Instrumentation, University of Science and Technology of China, Hefei, China
- Key Laboratory of Precision Scientific Instrumentation of Anhui Higher Education Institutes, University of Science and Technology of China, Hefei, China
| |
Collapse
|
28
|
Iglesias JE. A ready-to-use machine learning tool for symmetric multi-modality registration of brain MRI. Sci Rep 2023; 13:6657. [PMID: 37095168 PMCID: PMC10126156 DOI: 10.1038/s41598-023-33781-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 04/19/2023] [Indexed: 04/26/2023] Open
Abstract
Volumetric registration of brain MRI is routinely used in human neuroimaging, e.g., to align different MRI modalities, to measure change in longitudinal analysis, to map an individual to a template, or in registration-based segmentation. Classical registration techniques based on numerical optimization have been very successful in this domain, and are implemented in widespread software suites like ANTs, Elastix, NiftyReg, or DARTEL. Over the last 7-8 years, learning-based techniques have emerged, which have a number of advantages like high computational efficiency, potential for higher accuracy, easy integration of supervision, and the ability to be part of a meta-architectures. However, their adoption in neuroimaging pipelines has so far been almost inexistent. Reasons include: lack of robustness to changes in MRI modality and resolution; lack of robust affine registration modules; lack of (guaranteed) symmetry; and, at a more practical level, the requirement of deep learning expertise that may be lacking at neuroimaging research sites. Here, we present EasyReg, an open-source, learning-based registration tool that can be easily used from the command line without any deep learning expertise or specific hardware. EasyReg combines the features of classical registration tools, the capabilities of modern deep learning methods, and the robustness to changes in MRI modality and resolution provided by our recent work in domain randomization. As a result, EasyReg is: fast; symmetric; diffeomorphic (and thus invertible); agnostic to MRI modality and resolution; compatible with affine and nonlinear registration; and does not require any preprocessing or parameter tuning. We present results on challenging registration tasks, showing that EasyReg is as accurate as classical methods when registering 1 mm isotropic scans within MRI modality, but much more accurate across modalities and resolutions. EasyReg is publicly available as part of FreeSurfer; see https://surfer.nmr.mgh.harvard.edu/fswiki/EasyReg .
Collapse
Affiliation(s)
- Juan Eugenio Iglesias
- Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital and Harvard Medical School, Boston, 02129, USA.
- Department of Medical Physics and Biomedical Engineering, University College London, London, WC1V 6LJ, UK.
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Boston, 02139, USA.
| |
Collapse
|
29
|
A survey of feature detection methods for localisation of plain sections of axial brain magnetic resonance imaging. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2023.104611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
|
30
|
Xia X, Xiang H, Cao Y, Ge Z, Jiang Z. Feature Extraction and Matching of Humanoid-Eye Binocular Images Based on SUSAN-SIFT Algorithm. Biomimetics (Basel) 2023; 8:biomimetics8020139. [PMID: 37092391 PMCID: PMC10123616 DOI: 10.3390/biomimetics8020139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 03/24/2023] [Accepted: 03/27/2023] [Indexed: 04/03/2023] Open
Abstract
Imitating the visual characteristics of human eyes is one of the important tasks of digital image processing and computer vision. Feature correspondence of humanoid-eye binocular images is a prerequisite for obtaining the fused image. Human eyes are more sensitive to edge, because it contains much information. However, existing matching methods usually fail in producing enough edge corresponding pairs for humanoid-eye images because of viewpoint and view direction differences. To this end, we propose a novel and effective feature matching algorithm based on edge points. The proposed method consists of four steps. First, the SUSAN operator is employed to detect features, for its outstanding edge feature extraction capability. Second, the input image is constructed into a multi-scale structure based on image pyramid theory, which is then used to compute simplified SIFT descriptors for all feature points. Third, a novel multi-scale descriptor is constructed, by stitching the simplified SIFT descriptor of each layer. Finally, the similarity of multi-scale descriptors is measured by bidirectional matching, and the obtained preliminary matches are refined by subsequent procedures, to achieve accurate matching results. We respectively conduct qualitative and quantitative experiments, which demonstrate that our method can robustly match feature points in humanoid-eye binocular image pairs, and achieve favorable performance under illumination changes compared to the state-of-the-art.
Collapse
Affiliation(s)
- Xiaohua Xia
- Key Laboratory of Road Construction Technology and Equipment of MOE, Chang’an University, Xi’an 710064, China
- Correspondence:
| | - Haoming Xiang
- Key Laboratory of Road Construction Technology and Equipment of MOE, Chang’an University, Xi’an 710064, China
| | - Yusong Cao
- Key Laboratory of Road Construction Technology and Equipment of MOE, Chang’an University, Xi’an 710064, China
| | - Zhaokai Ge
- Key Laboratory of Road Construction Technology and Equipment of MOE, Chang’an University, Xi’an 710064, China
| | - Zainan Jiang
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, Harbin 150001, China;
| |
Collapse
|
31
|
Awan R, Raza SEA, Lotz J, Weiss N, Rajpoot N. Deep feature based cross-slide registration. Comput Med Imaging Graph 2023; 104:102162. [PMID: 36584537 DOI: 10.1016/j.compmedimag.2022.102162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2022] [Revised: 11/15/2022] [Accepted: 12/08/2022] [Indexed: 12/23/2022]
Abstract
Registration of multiple sections in a tissue block is an important pre-requisite task before any cross-slide image analysis. Non-rigid registration methods are capable of finding correspondence by locally transforming a moving image. These methods often rely on an initial guess to roughly align an image pair linearly and globally. This is essential to prevent convergence to a non-optimal minimum. We explore a deep feature based registration (DFBR) method which utilises data-driven descriptors to estimate the global transformation. A multi-stage strategy is adopted for improving the quality of registration. A visualisation tool is developed to view registered pairs of WSIs at different magnifications. With the help of this tool, one can apply a transformation on the fly without the need to generate a transformed moving WSI in a pyramidal form. We compare the performance on our dataset of data-driven descriptors with that of hand-crafted descriptors. Our approach can align the images with only small registration errors. The efficacy of our proposed method is evaluated for a subsequent non-rigid registration step. To this end, the first two steps of the ANHIR winner's framework are replaced with DFBR to register image pairs provided by the challenge. The modified framework produce comparable results to those of the challenge winning team.
Collapse
Affiliation(s)
- Ruqayya Awan
- Department of Computer Science, University of Warwick, CV4 7AL Coventry, UK.
| | - Shan E Ahmed Raza
- Department of Computer Science, University of Warwick, CV4 7AL Coventry, UK.
| | - Johannes Lotz
- Fraunhofer Institute for Digital Medicine MEVIS, Lübeck, Germany.
| | - Nick Weiss
- Fraunhofer Institute for Digital Medicine MEVIS, Lübeck, Germany.
| | - Nasir Rajpoot
- Department of Computer Science, University of Warwick, CV4 7AL Coventry, UK; Department of Pathology, University Hospitals Coventry, Warwickshire, UK; The Alan Turing Institute, London, UK.
| |
Collapse
|
32
|
Feature Point Detection and Description Networks Based on Asymmetric Convolution and the Cross-ResolutionImage-Matching Method. INT J INTELL SYST 2023. [DOI: 10.1155/2023/5131440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2023]
Abstract
Image matching can be transformed into the problem of feature point detection and matching of images. The current neural network methods have a weak detection effect on feature points and cannot extract enough sparse and uniform feature points. In order to improve the detection and description ability of feature points, this paper proposes a self-supervised feature point detection and description network based on asymmetric convolution: ACPoint. Specifically, first, feature point pseudolabels are learned from an unlabeled dataset, and pseudolabels are used for supervised learning; then, the learned model is used to update pseudolabels. Through multiple iterations of model training and label updating, high-quality labels and high-accuracy models are obtained adaptively. The asymmetric convolution feature point (ACPoint) network adopts an asymmetric convolution module to simultaneously train three convolution branches to learn more feature information, which uses two one-dimensional convolutions to enhance the backbone of square convolution from both horizontal and vertical directions and improve the representation of local features during inference. Based on the ACPoint network, a cross-resolution image-matching method is proposed. Experiments show that our proposed network model has higher localization accuracy and homography estimation ability on the HPatches dataset.
Collapse
|
33
|
Feed-Forward Deep Neural Network (FFDNN)-Based Deep Features for Static Malware Detection. INT J INTELL SYST 2023. [DOI: 10.1155/2023/9544481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2023]
Abstract
The portable executable header (PEH) information is commonly used as a feature for malware detection systems to train and validate machine learning (ML) or deep learning (DL) classifiers. We propose to extract the deep features from the PEH information through hidden layers of a feed-forward deep neural network (FFDNN). The extraction of deep features of hidden layers represents the dataset with a better generalization for malware detection. While feeding the deep feature of one hidden layer to the succeeding layer, the Gaussian error linear unit (GeLU) activation function is applied. The FFDNN is trained with the GeLU activation function using the deep features of individual layers as well as concatenated deep features of all hidden layers. Similarly, the ML classifiers are also trained and validated in with individual layer deep features and concatenated features. Three highly effective ML classifiers, random forest (RF), support vector machine (SVM), and k-nearest neighbour (k-NN) have been investigated. The performance of the proposed model is demonstrated using a statically significant large dataset. The obtained results are interesting and encouraging in terms of classification accuracy. The classification accuracy reaches 99.15% with the internal discriminative deep feature for the proposed FFDNN-ML classifier with the GeLU activation function.
Collapse
|
34
|
Liu L, Aitken JM. HFNet-SLAM: An Accurate and Real-Time Monocular SLAM System with Deep Features. SENSORS (BASEL, SWITZERLAND) 2023; 23:2113. [PMID: 36850708 PMCID: PMC9965254 DOI: 10.3390/s23042113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 02/05/2023] [Accepted: 02/06/2023] [Indexed: 06/18/2023]
Abstract
Image tracking and retrieval strategies are of vital importance in visual Simultaneous Localization and Mapping (SLAM) systems. For most state-of-the-art systems, hand-crafted features and bag-of-words (BoW) algorithms are the common solutions. Recent research reports the vulnerability of these traditional algorithms in complex environments. To replace these methods, this work proposes HFNet-SLAM, an accurate and real-time monocular SLAM system built on the ORB-SLAM3 framework incorporated with deep convolutional neural networks (CNNs). This work provides a pipeline of feature extraction, keypoint matching, and loop detection fully based on features from CNNs. The performance of this system has been validated on public datasets against other state-of-the-art algorithms. The results reveal that the HFNet-SLAM achieves the lowest errors among systems available in the literature. Notably, the HFNet-SLAM obtains an average accuracy of 2.8 cm in EuRoC dataset in pure visual configuration. Besides, it doubles the accuracy in medium and large environments in TUM-VI dataset compared with ORB-SLAM3. Furthermore, with the optimisation of TensorRT technology, the entire system can run in real-time at 50 FPS.
Collapse
|
35
|
Bellavia F. SIFT Matching by Context Exposed. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:2445-2457. [PMID: 35320089 DOI: 10.1109/tpami.2022.3161853] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
This paper investigates how to step up local image descriptor matching by exploiting matching context information. Two main contexts are identified, originated respectively from the descriptor space and from the keypoint space. The former is generally used to design the actual matching strategy while the latter to filter matches according to the local spatial consistency. On this basis, a new matching strategy and a novel local spatial filter, named respectively blob matching and Delaunay Triangulation Matching (DTM) are devised. Blob matching provides a general matching framework by merging together several strategies, including rank-based pre-filtering as well as many-to-many and symmetric matching, enabling to achieve a global improvement upon each individual strategy. DTM alternates between Delaunay triangulation contractions and expansions to figure out and adjust keypoint neighborhood consistency. Experimental evaluation shows that DTM is comparable or better than the state-of-the-art in terms of matching accuracy and robustness. Evaluation is carried out according to a new benchmark devised for analyzing the matching pipeline in terms of correct correspondences on both planar and non-planar scenes, including several state-of-the-art methods as well as the common SIFT matching approach for reference. This evaluation can be of assistance for future research in this field.
Collapse
|
36
|
Deng Y, Ma J. ReDFeat: Recoupling Detection and Description for Multimodal Feature Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; PP:591-602. [PMID: 37015497 DOI: 10.1109/tip.2022.3231135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Deep-learning-based local feature extraction algorithms that combine detection and description have made significant progress in visible image matching. However, the end-to-end training of such frameworks is notoriously unstable due to the lack of strong supervision of detection and the inappropriate coupling between detection and description. The problem is magnified in cross-modal scenarios, in which most methods heavily rely on the pre-training. In this paper, we recouple independent constraints of detection and description of multimodal feature learning with a mutual weighting strategy, in which the detected probabilities of robust features are forced to peak and repeat, while features with high detection scores are emphasized during optimization. Different from previous works, those weights are detached from back propagation so that the detected probability of indistinct features would not be directly suppressed and the training would be more stable. Moreover, we propose the Super Detector, a detector that possesses a large receptive field and is equipped with learnable non-maximum suppression layers, to fulfill the harsh terms of detection. Finally, we build a benchmark that contains cross visible, infrared, near-infrared and synthetic aperture radar image pairs for evaluating the performance of features in feature matching and image registration tasks. Extensive experiments demonstrate that features trained with the recoulped detection and description, named ReDFeat, surpass previous state-of-the-arts in the benchmark, while the model can be readily trained from scratch. The code is released at https://github.com/ACuOoOoO/ReDFeat.
Collapse
|
37
|
Shu G, Shan Z, Di S, Ding X, Feng C. A Hybrid Quantum Image-Matching Algorithm. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1816. [PMID: 36554224 PMCID: PMC9777705 DOI: 10.3390/e24121816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 12/09/2022] [Accepted: 12/12/2022] [Indexed: 06/17/2023]
Abstract
Image matching is an important research topic in computer vision and image processing. However, existing quantum algorithms mainly focus on accurate matching between template pixels, and are not robust to changes in image location and scale. In addition, the similarity calculation of the matching process is a fundamentally important issue. Therefore, this paper proposes a hybrid quantum algorithm, which uses the robustness of SIFT (scale-invariant feature transform) to extract image features, and combines the advantages of quantum exponential storage and parallel computing to represent data and calculate feature similarity. Finally, the quantum amplitude estimation is used to extract the measurement results and realize the quadratic acceleration of calculation. The experimental results show that the matching effect of this algorithm is better than the existing classical architecture. Our hybrid algorithm broadens the application scope and field of quantum computing in image processing.
Collapse
|
38
|
Jian B, Ma C, Zhu D, Huang Q, Ao J. Water-Air Interface Imaging: Recovering the Images Distorted by Surface Waves via an Efficient Registration Algorithm. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1765. [PMID: 36554170 PMCID: PMC9777829 DOI: 10.3390/e24121765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/09/2022] [Revised: 11/28/2022] [Accepted: 11/29/2022] [Indexed: 06/17/2023]
Abstract
Imaging through the wavy water-air interface is challenging since the random fluctuations of water will cause complex geometric distortion and motion blur in the images, seriously affecting the effective identification of the monitored object. Considering the problems of image recovery accuracy and computational efficiency, an efficient reconstruction scheme that combines lucky-patch search and image registration technologies was proposed in this paper. Firstly, a high-quality reference frame is rebuilt using a lucky-patch search strategy. Then an iterative registration algorithm is employed to remove severe geometric distortions by registering warped frames to the reference frame. During the registration process, we integrate JADE and LBFGS algorithms as an optimization strategy to expedite the control parameter optimization process. Finally, the registered frames are refined using PCA and the lucky-patch search algorithm to remove residual distortions and random noise. Experimental results demonstrate that the proposed method significantly outperforms the state-of-the-art methods in terms of sharpness and contrast.
Collapse
Affiliation(s)
- Bijian Jian
- School of Information and Communication, Guilin University of Electronic Technology, Guilin 541000, China
- School of Artificial Intelligence, Hezhou University, Hezhou 542800, China
| | - Chunbo Ma
- School of Information and Communication, Guilin University of Electronic Technology, Guilin 541000, China
| | - Dejian Zhu
- School of Information and Communication, Guilin University of Electronic Technology, Guilin 541000, China
| | - Qihong Huang
- School of Information and Communication, Guilin University of Electronic Technology, Guilin 541000, China
| | - Jun Ao
- School of Information and Communication, Guilin University of Electronic Technology, Guilin 541000, China
| |
Collapse
|
39
|
Diaz-Ramirez VH, Gonzalez-Ruiz M, Kober V, Juarez-Salazar R. Stereo Image Matching Using Adaptive Morphological Correlation. SENSORS (BASEL, SWITZERLAND) 2022; 22:9050. [PMID: 36501752 PMCID: PMC9737403 DOI: 10.3390/s22239050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 11/17/2022] [Accepted: 11/20/2022] [Indexed: 06/17/2023]
Abstract
A stereo matching method based on adaptive morphological correlation is presented. The point correspondences of an input pair of stereo images are determined by matching locally adaptive image windows using the suggested morphological correlation that is optimal with respect to an introduced binary dissimilarity-to-matching ratio criterion. The proposed method is capable of determining the point correspondences in homogeneous image regions and at the edges of scene objects of input stereo images with high accuracy. Furthermore, unknown correspondences of occluded and not matched points in the scene can be successfully recovered using a simple proposed post-processing. The performance of the proposed method is exhaustively tested for stereo matching in terms of objective measures using known database images. In addition, the obtained results are discussed and compared with those of two similar state-of-the-art methods.
Collapse
Affiliation(s)
- Victor H. Diaz-Ramirez
- Instituto Politécnico Nacional-CITEDI, Instituto Politécnico Nacional 1310, Tijuana 22310, BC, Mexico
| | - Martin Gonzalez-Ruiz
- Instituto Politécnico Nacional-CITEDI, Instituto Politécnico Nacional 1310, Tijuana 22310, BC, Mexico
| | - Vitaly Kober
- Department of Computer Science, CICESE, Ensenada 22860, BC, Mexico
- Department of Mathematics, Chelyabinsk State University, 454001 Chelyabinsk, Russia
| | - Rigoberto Juarez-Salazar
- CONACYT-Instituo Politécnico Nacional, CITEDI, Instituto Politécnico Nacional 1310, Tijuana 22310, BC, Mexico
| |
Collapse
|
40
|
Lin Y, Wu F, Zhao J. Reinforcement learning-based image exposure reconstruction for homography estimation. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04287-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
41
|
Yang L, Guo X, Song X, Lu D, Cai W, Xiong Z. An Improved Human-Body-Segmentation Algorithm with Attention-Based Feature Fusion and a Refined Stereo-Matching Scheme Working at the Sub-Pixel Level for the Anthropometric System. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1647. [PMID: 36421502 PMCID: PMC9689509 DOI: 10.3390/e24111647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 11/07/2022] [Accepted: 11/11/2022] [Indexed: 06/16/2023]
Abstract
This paper proposes an improved human-body-segmentation algorithm with attention-based feature fusion and a refined corner-based feature-point design with sub-pixel stereo matching for the anthropometric system. In the human-body-segmentation algorithm, four CBAMs are embedded in the four middle convolution layers of the backbone network (ResNet101) of PSPNet to achieve better feature fusion in space and channels, so as to improve accuracy. The common convolution in the residual blocks of ResNet101 is substituted by group convolution to reduce model parameters and computational cost, thereby optimizing efficiency. For the stereo-matching scheme, a corner-based feature point is designed to obtain the feature-point coordinates at sub-pixel level, so that precision is refined. A regional constraint is applied according to the characteristic of the checkerboard corner points, thereby reducing complexity. Experimental results demonstrated that the anthropometric system with the proposed CBAM-based human-body-segmentation algorithm and corner-based stereo-matching scheme can significantly outperform the state-of-the-art system in accuracy. It can also meet the national standards GB/T 2664-2017, GA 258-2009 and GB/T 2665-2017; and the textile industry standards FZ/T 73029-2019, FZ/T 73017-2014, FZ/T 73059-2017 and FZ/T 73022-2019.
Collapse
Affiliation(s)
- Lei Yang
- School of Electronic and Information, Zhongyuan University of Technology, Zhengzhou 450007, China
| | - Xiaoyu Guo
- School of Electronic and Information, Zhongyuan University of Technology, Zhengzhou 450007, China
| | - Xiaowei Song
- School of Electronic and Information, Zhongyuan University of Technology, Zhengzhou 450007, China
- Dongjing Avenue Campus, Kaifeng University, Kaifeng 475004, China
| | - Deyuan Lu
- School of Electronic and Information, Zhongyuan University of Technology, Zhengzhou 450007, China
| | - Wenjing Cai
- School of Electronic and Information, Zhongyuan University of Technology, Zhengzhou 450007, China
| | - Zixiang Xiong
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
| |
Collapse
|
42
|
Fan A, Ma J, Jiang X, Ling H. Efficient Deterministic Search With Robust Loss Functions for Geometric Model Fitting. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:8212-8229. [PMID: 34473624 DOI: 10.1109/tpami.2021.3109784] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Geometric model fitting is a fundamental task in computer vision, which serves as the pre-requisite of many downstream applications. While the problem has a simple intrinsic structure where the solution can be parameterized within a few degrees of freedom, the ubiquitously existing outliers are the main challenge. In previous studies, random sampling techniques have been established as the practical choice, since optimization-based methods are usually too time-demanding. This prospective study is intended to design efficient algorithms that benefit from a general optimization-based view. In particular, two important types of loss functions are discussed, i.e., truncated and l1 losses, and efficient solvers have been derived for both upon specific approximations. Based on this philosophy, a class of algorithms are introduced to perform deterministic search for the inliers or geometric model. Recommendations are made based on theoretical and experimental analyses. Compared with the existing solutions, the proposed methods are both simple in computation and robust to outliers. Extensive experiments are conducted on publicly available datasets for geometric estimation, which demonstrate the superiority of our methods compared with the state-of-the-art ones. Additionally, we apply our method to the recent benchmark for wide-baseline stereo evaluation, leading to a significant improvement of performance. Our code is publicly available at https://github.com/AoxiangFan/EifficientDeterministicSearch.
Collapse
|
43
|
Robust two-phase registration method for three-dimensional point set under the Bayesian mixture framework. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01673-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
44
|
Guo Y, Zhao L, Shi Y, Zhang X, Du S, Wang F. Adaptive weighted robust iterative closest point. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.08.047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
45
|
Chi C, Hao T, Wang Q, Guo P, Yang X. Subspace-PnP: A Geometric Constraint Loss for Mutual Assistance of Depth and Optical Flow Estimation. Int J Comput Vis 2022. [DOI: 10.1007/s11263-022-01652-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
46
|
Bruefach A, Ophus C, Scott MC. Analysis of Interpretable Data Representations for 4D-STEM Using Unsupervised Learning. MICROSCOPY AND MICROANALYSIS : THE OFFICIAL JOURNAL OF MICROSCOPY SOCIETY OF AMERICA, MICROBEAM ANALYSIS SOCIETY, MICROSCOPICAL SOCIETY OF CANADA 2022; 28:1-11. [PMID: 36073035 DOI: 10.1017/s1431927622012259] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Understanding the structure of materials is crucial for engineering devices and materials with enhanced performance. Four-dimensional scanning transmission electron microscopy (4D-STEM) is capable of mapping nanometer-scale local crystallographic structure over micron-scale field of views. However, 4D-STEM datasets can contain tens of thousands of images from a wide variety of material structures, making it difficult to automate detection and classification of structures. Traditional automated analysis pipelines for 4D-STEM focus on supervised approaches, which require prior knowledge of the material structure and cannot describe anomalous or deviant structures. In this article, a pipeline for engineering 4D-STEM feature representations for unsupervised clustering using non-negative matrix factorization (NMF) is introduced. Each feature is evaluated using NMF and results are presented for both simulated and experimental data. It is shown that some data representations more reliably identify overlapping grains. Additionally, real space refinement is applied to identify spatially distinct sample regions, allowing for size and shape analysis to be performed. This work lays the foundation for improved analysis of nanoscale structural features in materials that deviate from expected crystallographic arrangement using 4D-STEM.
Collapse
Affiliation(s)
- Alexandra Bruefach
- Department of Materials Science and Engineering, University of California, Berkeley, CA 94720, USA
| | - Colin Ophus
- National Center for Electron Microscopy, Molecular Foundry, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | - Mary C Scott
- Department of Materials Science and Engineering, University of California, Berkeley, CA 94720, USA
- National Center for Electron Microscopy, Molecular Foundry, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| |
Collapse
|
47
|
Wei G, Tian Y, Kaneko S, Jiang Z. Robust Template Matching Using Multiple-Layered Absent Color Indexing. SENSORS (BASEL, SWITZERLAND) 2022; 22:6661. [PMID: 36081120 PMCID: PMC9460572 DOI: 10.3390/s22176661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 08/28/2022] [Accepted: 09/01/2022] [Indexed: 06/15/2023]
Abstract
Color is an essential feature in histogram-based matching. This can be extracted as statistical data during the comparison process. Although the applicability of color features in histogram-based techniques has been proven, position information is lacking during the matching process. We present a conceptually simple and effective method called multiple-layered absent color indexing (ABC-ML) for template matching. Apparent and absent color histograms are obtained from the original color histogram, where the absent colors belong to low-frequency or vacant bins. To determine the color range of compared images, we propose a total color space (TCS) that can determine the operating range of the histogram bins. Furthermore, we invert the absent colors to obtain the properties of these colors using threshold hT. Then, we compute the similarity using the intersection. A multiple-layered structure is proposed against the shift issue in histogram-based approaches. Each layer is constructed using the isotonic principle. Thus, absent color indexing and multiple-layered structure are combined to solve the precision problem. Our experiments on real-world images and open data demonstrated that they have produced state-of-the-art results. Moreover, they retained the histogram merits of robustness in cases of deformation and scaling.
Collapse
Affiliation(s)
- Guodong Wei
- School of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, China
| | - Ying Tian
- Graduate School of Information Science and Technology, Hokkaido University, Sapporo 060-0814, Japan
| | - Shun’ichi Kaneko
- Graduate School of Information Science and Technology, Hokkaido University, Sapporo 060-0814, Japan
| | - Zhengang Jiang
- School of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, China
| |
Collapse
|
48
|
Deng C, Chen S, Zhang Y, Zhang Q, Chen F. ULMR: An Unsupervised Learning Framework for Mismatch Removal. SENSORS (BASEL, SWITZERLAND) 2022; 22:6110. [PMID: 36015871 PMCID: PMC9413738 DOI: 10.3390/s22166110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 08/12/2022] [Accepted: 08/13/2022] [Indexed: 06/15/2023]
Abstract
Due to radiometric and geometric distortions between images, mismatches are inevitable. Thus, a mismatch removal process is required for improving matching accuracy. Although deep learning methods have been proved to outperform handcraft methods in specific scenarios, including image identification and point cloud classification, most learning methods are supervised and are susceptible to incorrect labeling, and labeling data is a time-consuming task. This paper takes advantage of deep reinforcement leaning (DRL) and proposes a framework named unsupervised learning for mismatch removal (ULMR). Resorting to DRL, ULMR firstly scores each state-action pair guided by the output of classification network; then, it calculates the policy gradient of the expected reward; finally, through maximizing the expected reward of state-action pairings, the optimal network can be obtained. Compared to supervised learning methods (e.g., NM-Net and LFGC), unsupervised learning methods (e.g., ULCM), and handcraft methods (e.g., RANSAC, GMS), ULMR can obtain higher precision, more remaining correct matches, and fewer remaining false matches in testing experiments. Moreover, ULMR shows greater stability, better accuracy, and higher quality in application experiments, demonstrating reduced sampling times and higher compatibility with other classification networks in ablation experiments, indicating its great potential for further use.
Collapse
Affiliation(s)
- Cailong Deng
- School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China
| | - Shiyu Chen
- School of Geographic Sciences, Xinyang Normal University, Xinyang 464000, China
- Henan Engineering Research Center for Big Data of Remote Sensing and Intelligent Analysis in Huaihe River Basin, Xinyang Normal University, Xinyang 464000, China
- Key Laboratory for National Geographic Census and Monitoring, National Administration of Surveying, Mapping and Geoinformation, Wuhan University, Wuhan 430079, China
| | - Yong Zhang
- Visiontek Research, 6 Phoenix Avenue, Wuhan 430205, China
- School of Electronics and Information Engineering, Wuzhou University, Wuzhou 543003, China
| | - Qixin Zhang
- School of Geographic Sciences, Xinyang Normal University, Xinyang 464000, China
| | - Feiyan Chen
- School of Geographic Sciences, Xinyang Normal University, Xinyang 464000, China
- Henan Engineering Research Center for Big Data of Remote Sensing and Intelligent Analysis in Huaihe River Basin, Xinyang Normal University, Xinyang 464000, China
| |
Collapse
|
49
|
Liu X, Yuan D, Xue K, Li JB, Zhao H, Liu H, Wang T. Diffeomorphic matching with multiscale kernels based on sparse parameterization for cross-view target detection. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03668-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
50
|
Liu Y, Huang K, Li J, Li X, Zeng Z, Chang L, Zhou J. AdaSG: A Lightweight Feature Point Matching Method Using Adaptive Descriptor with GNN for VSLAM. SENSORS (BASEL, SWITZERLAND) 2022; 22:5992. [PMID: 36015753 PMCID: PMC9414433 DOI: 10.3390/s22165992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 08/05/2022] [Accepted: 08/08/2022] [Indexed: 06/15/2023]
Abstract
Feature point matching is a key component in visual simultaneous localization and mapping (VSLAM). Recently, the neural network has been employed in the feature point matching to improve matching performance. Among the state-of-the-art feature point matching methods, the SuperGlue is one of the top methods and ranked the first in the CVPR 2020 workshop on image matching. However, this method utilizes graph neural network (GNN), resulting in large computational complexity, which makes it unsuitable for resource-constrained devices, such as robots and mobile phones. In this work, we propose a lightweight feature point matching method based on the SuperGlue (named as AdaSG). Compared to the SuperGlue, the AdaSG adaptively adjusts its operating architecture according to the similarity of input image pair to reduce the computational complexity while achieving high matching performance. The proposed method has been evaluated through the commonly used datasets, including indoor and outdoor environments. Compared with several state-of-the-art feature point matching methods, the proposed method achieves significantly less runtime (up to 43× for indoor and up to 6× for outdoor) with similar or better matching performance. It is suitable for feature point matching in resource constrained devices.
Collapse
Affiliation(s)
- Ye Liu
- School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Kun Huang
- School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Jingyuan Li
- School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Xiangting Li
- School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Zeng Zeng
- School of Microelectronics, Shanghai University, Shanghai 200444, China
| | - Liang Chang
- School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Jun Zhou
- School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
| |
Collapse
|