1
|
DeepMesh: Differentiable Iso-Surface Extraction. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; PP:1-15. [PMID: 38648137 DOI: 10.1109/tpami.2024.3392291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2024]
Abstract
Geometric Deep Learning has recently made striking progress with the advent of continuous deep implicit fields. They allow for detailed modeling of watertight surfaces of arbitrary topology while not relying on a 3D Euclidean grid, resulting in a learnable parameterization that is unlimited in resolution. Unfortunately, these methods are often unsuitable for applications that require an explicit mesh-based surface representation because converting an implicit field to such a representation relies on the Marching Cubes algorithm, which cannot be differentiated with respect to the underlying implicit field. In this work, we remove this limitation and introduce a differentiable way to produce explicit surface mesh representations from Deep Implicit Fields. Our key insight is that by reasoning on how implicit field perturbations impact local surface geometry, one can ultimately differentiate the 3D location of surface samples with respect to the underlying deep implicit field. We exploit this to define DeepMesh - an end-to-end differentiable mesh representation that can vary its topology. We validate our theoretical insight through several applications: Single view 3D Reconstruction via Differentiable Rendering, Physically-Driven Shape Optimization, Full Scene 3D Reconstruction from Scans and End-to-End Training. In all cases our end-to-end differentiable parameterization gives us an edge over state-of-the-art algorithms.
Collapse
|
2
|
A Closed-Form, Pairwise Solution to Local Non-Rigid Structure-from-Motion. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; PP:1-14. [PMID: 38578851 DOI: 10.1109/tpami.2024.3383316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/07/2024]
Abstract
A recent trend in Non-Rigid Structure-from-Motion (NRSfM) is to express local, differential constraints between pairs of images, from which the surface normal at any point can be obtained by solving a system of polynomial equations. While this approach is more successful than its counterparts relying on global constraints, the resulting methods face two main problems: First, most of the equation systems they formulate are of high degree and must be solved using computationally expensive polynomial solvers. Some methods use polynomial reduction strategies to simplify the system, but this adds some phantom solutions. In any event, an additional mechanism is employed to pick the best solution, which adds to the computation without any guarantees on the reliability of the solution. Second, these methods formulate constraints between a pair of images. Even if there is enough motion between them, they may suffer from local degeneracies that make the resulting estimates unreliable without any warning mechanism. %Unfortunately, these systems are of high degree with up to five real solutions. Hence, a computationally expensive strategy is required to select a unique solution. Furthermore, they suffer from degeneracies that make the resulting estimates unreliable, without any mechanism to identify this situation. In this paper, we solve these problems for isometric/conformal NRSfM. We show that, under widely applicable assumptions, we can derive a new system of equations in terms of the surface normals, whose two solutions can be obtained in closed-form and can easily be disambiguated locally. Our formalism also allows us to assess how reliable the estimated local normals are and to discard them if they are not. Our experiments show that our reconstructions, obtained from two or more views, are significantly more accurate than those of state-of-the-art methods, while also being faster. %In this paper, we show that, under widely applicable assumptions, we can derive a new system of equations in terms of the surface normals, whose two solutions can be obtained in closed-form and can easily be disambiguated locally. Our formalism also allows us to assess how reliable the estimated local normals are and to discard them if they are not. Our experiments show that our reconstructions, obtained from two or more views, are significantly more accurate than those of state-of-the-art methods, while also being faster.
Collapse
|
3
|
Detecting Road Obstacles by Erasing Them. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:2450-2460. [PMID: 38019625 DOI: 10.1109/tpami.2023.3335152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/01/2023]
Abstract
Vehicles can encounter a myriad of obstacles on the road, and it is impossible to record them all beforehand to train a detector. Instead, we select image patches and inpaint them with the surrounding road texture, which tends to remove obstacles from those patches. We then use a network trained to recognize discrepancies between the original patch and the inpainted one, which signals an erased obstacle.
Collapse
|
4
|
BigNeuron: a resource to benchmark and predict performance of algorithms for automated tracing of neurons in light microscopy datasets. Nat Methods 2023; 20:824-835. [PMID: 37069271 DOI: 10.1038/s41592-023-01848-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 03/14/2023] [Indexed: 04/19/2023]
Abstract
BigNeuron is an open community bench-testing platform with the goal of setting open standards for accurate and fast automatic neuron tracing. We gathered a diverse set of image volumes across several species that is representative of the data obtained in many neuroscience laboratories interested in neuron tracing. Here, we report generated gold standard manual annotations for a subset of the available imaging datasets and quantified tracing quality for 35 automatic tracing algorithms. The goal of generating such a hand-curated diverse dataset is to advance the development of tracing algorithms and enable generalizable benchmarking. Together with image quality features, we pooled the data in an interactive web application that enables users and developers to perform principal component analysis, t-distributed stochastic neighbor embedding, correlation and clustering, visualization of imaging and tracing data, and benchmarking of automatic tracing algorithms in user-defined data subsets. The image quality metrics explain most of the variance in the data, followed by neuromorphological features related to neuron size. We observed that diverse algorithms can provide complementary information to obtain accurate results and developed a method to iteratively combine methods and generate consensus reconstructions. The consensus trees obtained provide estimates of the neuron structure ground truth that typically outperform single algorithms in noisy datasets. However, specific algorithms may outperform the consensus tree strategy in specific imaging conditions. Finally, to aid users in predicting the most accurate automatic tracing results without manual annotations for comparison, we used support vector machine regression to predict reconstruction quality given an image volume and a set of automatic tracings.
Collapse
|
5
|
Temporal Representation Learning on Monocular Videos for 3D Human Pose Estimation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:6415-6427. [PMID: 36251908 DOI: 10.1109/tpami.2022.3215307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
In this article we propose an unsupervised feature extraction method to capture temporal information on monocular videos, where we detect and encode subject of interest in each frame and leverage contrastive self-supervised (CSS) learning to extract rich latent vectors. Instead of simply treating the latent features of nearby frames as positive pairs and those of temporally-distant ones as negative pairs as in other CSS approaches, we explicitly disentangle each latent vector into a time-variant component and a time-invariant one. We then show that applying contrastive loss only to the time-variant features and encouraging a gradual transition on them between nearby and away frames while also reconstructing the input, extract rich temporal features, well-suited for human pose estimation. Our approach reduces error by about 50% compared to the standard CSS strategies, outperforms other unsupervised single-view methods and matches the performance of multi-view techniques. When 2D pose is available, our approach can extract even richer latent features and improve the 3D pose estimation accuracy, outperforming other state-of-the-art weakly supervised methods.
Collapse
|
6
|
Persistent Homology with Improved Locality Information for more Effective Delineation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; PP:1-8. [PMID: 37028072 DOI: 10.1109/tpami.2023.3246921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Persistent Homology (PH) has been successfully used to train networks to detect curvilinear structures and to improve the topological quality of their results. However, existing methods are very global and ignore the location of topological features. In this paper, we remedy this by introducing a new filtration function that fuses two earlier approaches: thresholding-based filtration, previously used to train deep networks to segment medical images, and filtration with height functions, typically used to compare 2D and 3D shapes. We experimentally demonstrate that deep networks trained using our PH-based loss function yield reconstructions of road networks and neuronal processes that reflect ground-truth connectivity better than networks trained with existing loss functions based on PH.
Collapse
|
7
|
Perspective Aware Road Obstacle Detection. IEEE Robot Autom Lett 2023. [DOI: 10.1109/lra.2023.3245410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
|
8
|
Self-Supervised Human Detection and Segmentation via Background Inpainting. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:9574-9588. [PMID: 34714741 DOI: 10.1109/tpami.2021.3123902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
While supervised object detection and segmentation methods achieve impressive accuracy, they generalize poorly to images whose appearance significantly differs from the data they have been trained on. To address this when annotating data is prohibitively expensive, we introduce a self-supervised detection and segmentation approach that can work with single images captured by a potentially moving camera. At the heart of our approach lies the observation that object segmentation and background reconstruction are linked tasks, and that, for structured scenes, background regions can be re-synthesized from their surroundings, whereas regions depicting the moving object cannot. We encode this intuition into a self-supervised loss function that we exploit to train a proposal-based segmentation network. To account for the discrete nature of the proposals, we develop a Monte Carlo-based training strategy that allows the algorithm to explore the large space of object proposals. We apply our method to human detection and segmentation in images that visually depart from those of standard benchmarks and outperform existing self-supervised methods.
Collapse
|
9
|
Adjusting the Ground Truth Annotations for Connectivity-Based Learning to Delineate. IEEE TRANSACTIONS ON MEDICAL IMAGING 2022; 41:3675-3685. [PMID: 35862340 DOI: 10.1109/tmi.2022.3193072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Deep learning-based approaches to delineating 3D structure depend on accurate annotations to train the networks. Yet in practice, people, no matter how conscientious, have trouble precisely delineating in 3D and on a large scale, in part because the data is often hard to interpret visually and in part because the 3D interfaces are awkward to use. In this paper, we introduce a method that explicitly accounts for annotation inaccuracies. To this end, we treat the annotations as active contour models that can deform themselves while preserving their topology. This enables us to jointly train the network and correct potential errors in the original annotations. The result is an approach that boosts performance of deep networks trained with potentially inaccurate annotations.
Collapse
|
10
|
Counting People by Estimating People Flows. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:8151-8166. [PMID: 34351854 DOI: 10.1109/tpami.2021.3102690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Modern methods for counting people in crowded scenes rely on deep networks to estimate people densities in individual images. As such, only very few take advantage of temporal consistency in video sequences, and those that do only impose weak smoothness constraints across consecutive frames. In this paper, we advocate estimating people flows across image locations between consecutive images and inferring the people densities from these flows instead of directly regressing them. This enables us to impose much stronger constraints encoding the conservation of the number of people. As a result, it significantly boosts performance without requiring a more complex architecture. Furthermore, it allows us to exploit the correlation between people flow and optical flow to further improve the results. We also show that leveraging people conservation constraints in both a spatial and temporal manner makes it possible to train a deep crowd counting model in an active learning setting with much fewer annotations. This significantly reduces the annotation cost while still leading to similar performance to the full supervision case.
Collapse
|
11
|
Promoting Connectivity of Network-Like Structures by Enforcing Region Separation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:5401-5413. [PMID: 33881988 DOI: 10.1109/tpami.2021.3074366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
We propose a novel, connectivity-oriented loss function for training deep convolutional networks to reconstruct network-like structures, like roads and irrigation canals, from aerial images. The main idea behind our loss is to express the connectivity of roads, or canals, in terms of disconnections that they create between background regions of the image. In simple terms, a gap in the predicted road causes two background regions, that lie on the opposite sides of a ground truth road, to touch in prediction. Our loss function is designed to prevent such unwanted connections between background regions, and therefore close the gaps in predicted roads. It also prevents predicting false positive roads and canals by penalizing unwarranted disconnections of background regions. In order to capture even short, dead-ending road segments, we evaluate the loss in small image crops. We show, in experiments on two standard road benchmarks and a new data set of irrigation canals, that convnets trained with our loss function recover road connectivity so well that it suffices to skeletonize their output to produce state of the art maps. A distinct advantage of our approach is that the loss can be plugged in to any existing training setup without further modifications.
Collapse
|
12
|
Robust Differentiable SVD. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:5472-5487. [PMID: 33844626 DOI: 10.1109/tpami.2021.3072422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Eigendecomposition of symmetric matrices is at the heart of many computer vision algorithms. However, the derivatives of the eigenvectors tend to be numerically unstable, whether using the SVD to compute them analytically or using the Power Iteration (PI) method to approximate them. This instability arises in the presence of eigenvalues that are close to each other. This makes integrating eigendecomposition into deep networks difficult and often results in poor convergence, particularly when dealing with large matrices. While this can be mitigated by partitioning the data into small arbitrary groups, doing so has no theoretical basis and makes it impossible to exploit the full power of eigendecomposition. In previous work, we mitigated this using SVD during the forward pass and PI to compute the gradients during the backward pass. However, the iterative deflation procedure required to compute multiple eigenvectors using PI tends to accumulate errors and yield inaccurate gradients. Here, we show that the Taylor expansion of the SVD gradient is theoretically equivalent to the gradient obtained using PI without relying in practice on an iterative process and thus yields more accurate gradients. We demonstrate the benefits of this increased accuracy for image classification and style transfer.
Collapse
|
13
|
3D reconstruction of curvilinear structures with stereo matching deep convolutional neural networks. Ultramicroscopy 2022; 234:113460. [PMID: 35121280 DOI: 10.1016/j.ultramic.2021.113460] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Revised: 11/21/2021] [Accepted: 12/23/2021] [Indexed: 11/25/2022]
Abstract
Curvilinear structures frequently appear in microscopy imaging as the object of interest. Crystallographic defects, i.e dislocations, are one of the curvilinear structures that have been repeatedly investigated under transmission electron microscopy (TEM) and their 3D structural information is of great importance for understanding the properties of materials. 3D information of dislocations is often obtained by tomography which is a cumbersome process since it is required to acquire many images with different tilt angles and similar imaging conditions. Although, alternative stereoscopy methods lower the number of required images to two, they still require human intervention and shape priors for accurate 3D estimation. We propose a fully automated pipeline for both detection and matching of curvilinear structures in stereo pairs by utilizing deep convolutional neural networks (CNNs) without making any prior assumption on 3D shapes. In this work, we mainly focus on 3D reconstruction of dislocations from stereo pairs of TEM images.
Collapse
|
14
|
GarNet++: Improving Fast and Accurate Static 3D Cloth Draping by Curvature Loss. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:181-195. [PMID: 32750825 DOI: 10.1109/tpami.2020.3010886] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this paper, we tackle the problem of static 3D cloth draping on virtual human bodies. We introduce a two-stream deep network model that produces a visually plausible draping of a template cloth on virtual 3D bodies by extracting features from both the body and garment shapes. Our network learns to mimic a physics-based simulation (PBS) method while requiring two orders of magnitude less computation time. To train the network, we introduce loss terms inspired by PBS to produce plausible results and make the model collision-aware. To increase the details of the draped garment, we introduce two loss functions that penalize the difference between the curvature of the predicted cloth and PBS. Particularly, we study the impact of mean curvature normal and a novel detail-preserving loss both qualitatively and quantitatively. Our new curvature loss computes the local covariance matrices of the 3D points, and compares the Rayleigh quotients of the prediction and PBS. This leads to more details while performing favorably or comparably against the loss that considers mean curvature normal vectors in the 3D triangulated meshes. We validate our framework on four garment types for various body shapes and poses. Finally, we achieve superior performance against a recently proposed data-driven method.
Collapse
|
15
|
Eigendecomposition-Free Training of Deep Networks for Linear Least-Square Problems. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:3167-3182. [PMID: 32149625 DOI: 10.1109/tpami.2020.2978812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Many classical Computer Vision problems, such as essential matrix computation and pose estimation from 3D to 2D correspondences, can be tackled by solving a linear least-square problem, which can be done by finding the eigenvector corresponding to the smallest, or zero, eigenvalue of a matrix representing a linear system. Incorporating this in deep learning frameworks would allow us to explicitly encode known notions of geometry, instead of having the network implicitly learn them from data. However, performing eigendecomposition within a network requires the ability to differentiate this operation. While theoretically doable, this introduces numerical instability in the optimization process in practice. In this paper, we introduce an eigendecomposition-free approach to training a deep network whose loss depends on the eigenvector corresponding to a zero eigenvalue of a matrix predicted by the network. We demonstrate that our approach is much more robust than explicit differentiation of the eigendecomposition using two general tasks, outlier rejection and denoising, with several practical examples including wide-baseline stereo, the perspective-n-point problem, and ellipse fitting. Empirically, our method has better convergence properties and yields state-of-the-art results.
Collapse
|
16
|
LiftPose3D, a deep learning-based approach for transforming two-dimensional to three-dimensional poses in laboratory animals. Nat Methods 2021; 18:975-981. [PMID: 34354294 PMCID: PMC7611544 DOI: 10.1038/s41592-021-01226-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Accepted: 06/29/2021] [Indexed: 12/22/2022]
Abstract
Markerless three-dimensional (3D) pose estimation has become an indispensable tool for kinematic studies of laboratory animals. Most current methods recover 3D poses by multi-view triangulation of deep network-based two-dimensional (2D) pose estimates. However, triangulation requires multiple synchronized cameras and elaborate calibration protocols that hinder its widespread adoption in laboratory studies. Here we describe LiftPose3D, a deep network-based method that overcomes these barriers by reconstructing 3D poses from a single 2D camera view. We illustrate LiftPose3D's versatility by applying it to multiple experimental systems using flies, mice, rats and macaques, and in circumstances where 3D triangulation is impractical or impossible. Our framework achieves accurate lifting for stereotypical and nonstereotypical behaviors from different camera angles. Thus, LiftPose3D permits high-quality 3D pose estimation in the absence of complex camera arrays and tedious calibration procedures and despite occluded body parts in freely behaving animals.
Collapse
|
17
|
Matching Seqlets: An Unsupervised Approach for Locality Preserving Sequence Matching. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:745-752. [PMID: 31425018 DOI: 10.1109/tpami.2019.2934052] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this paper, we propose a novel unsupervised approach for sequence matching by explicitly accounting for the locality properties in the sequences. In contrast to conventional approaches that rely on frame-to-frame matching, we conduct matching using sequencelet or seqlet, a sub-sequence wherein the frames share strong similarities and are thus grouped together. The optimal seqlets and matching between them are learned jointly, without any supervision from users. The learned seqlets preserve the locality information at the scale of interest and resolve the ambiguities during matching, which are omitted by frame-based matching methods. We show that our proposed approach outperforms the state-of-the-art ones on datasets of different domains including human actions, facial expressions, speech, and character strokes.
Collapse
|
18
|
|
19
|
Joint Segmentation and Path Classification of Curvilinear Structures. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2020; 42:1515-1521. [PMID: 31180837 DOI: 10.1109/tpami.2019.2921327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Detection of curvilinear structures in images has long been of interest. One of the most challenging aspects of this problem is inferring the graph representation of the curvilinear network. Most existing delineation approaches first perform binary segmentation of the image and then refine it using either a set of hand-designed heuristics or a separate classifier that assigns likelihood to paths extracted from the pixel-wise prediction. In our work, we bridge the gap between segmentation and path classification by training a deep network that performs those two tasks simultaneously. We show that this approach is beneficial because it enforces consistency across the whole processing pipeline. We apply our approach on roads and neurons datasets.
Collapse
|
20
|
Visual Correspondences for Unsupervised Domain Adaptation on Electron Microscopy Images. IEEE TRANSACTIONS ON MEDICAL IMAGING 2020; 39:1256-1267. [PMID: 31603817 DOI: 10.1109/tmi.2019.2946462] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We present an Unsupervised Domain Adaptation strategy to compensate for domain shifts on Electron Microscopy volumes. Our method aggregates visual correspondences-motifs that are visually similar across different acquisitions-to infer changes on the parameters of pretrained models, and enable them to operate on new data. In particular, we examine the annotations of an existing acquisition to determine pivot locations that characterize the reference segmentation, and use a patch matching algorithm to find their candidate visual correspondences in a new volume. We aggregate all the candidate correspondences by a voting scheme and we use them to construct a consensus heatmap: a map of how frequently locations on the new volume are matched to relevant locations from the original acquisition. This information allows us to perform model adaptations in two different ways: either by a) optimizing model parameters under a Multiple Instance Learning formulation, so that predictions between reference locations and their sets of correspondences agree, or by b) using high-scoring regions of the heatmap as soft labels to be incorporated in other domain adaptation pipelines, including deep learning ones. We show that these unsupervised techniques allow us to obtain high-quality segmentations on unannotated volumes, qualitatively consistent with results obtained under full supervision, for both mitochondria and synapses, with no need for new annotation effort.
Collapse
|
21
|
Voxel2Mesh: 3D Mesh Model Generation from Volumetric Data. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION – MICCAI 2020 2020. [DOI: 10.1007/978-3-030-59719-1_30] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
22
|
Tracing in 2D to reduce the annotation effort for 3D deep delineation of linear structures. Med Image Anal 2019; 60:101590. [PMID: 31841949 DOI: 10.1016/j.media.2019.101590] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Revised: 10/19/2019] [Accepted: 10/21/2019] [Indexed: 11/16/2022]
Abstract
The difficulty of obtaining annotations to build training databases still slows down the adoption of recent deep learning approaches for biomedical image analysis. In this paper, we show that we can train a Deep Net to perform 3D volumetric delineation given only 2D annotations in Maximum Intensity Projections (MIP) of the training volumes. This significantly reduces the annotation time: We conducted a user study that suggests that annotating 2D projections is on average twice as fast as annotating the original 3D volumes. Our technical contribution is a loss function that evaluates a 3D prediction against annotations of 2D projections. It is inspired by space carving, a classical approach to reconstructing complex 3D shapes from arbitrarily-positioned cameras. It can be used to train any deep network with volumetric output, without the need to change the network's architecture. Substituting the loss is all it takes to enable 2D annotations in an existing training setup. In extensive experiments on 3D light microscopy images of neurons and retinal blood vessels, and on Magnetic Resonance Angiography (MRA) brain scans, we show that, when trained on projection annotations, deep delineation networks perform as well as when they are trained using costlier 3D annotations.
Collapse
|
23
|
DeepFly3D, a deep learning-based approach for 3D limb and appendage tracking in tethered, adult Drosophila. eLife 2019; 8:e48571. [PMID: 31584428 PMCID: PMC6828327 DOI: 10.7554/elife.48571] [Citation(s) in RCA: 74] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2019] [Accepted: 09/28/2019] [Indexed: 12/23/2022] Open
Abstract
Studying how neural circuits orchestrate limbed behaviors requires the precise measurement of the positions of each appendage in three-dimensional (3D) space. Deep neural networks can estimate two-dimensional (2D) pose in freely behaving and tethered animals. However, the unique challenges associated with transforming these 2D measurements into reliable and precise 3D poses have not been addressed for small animals including the fly, Drosophila melanogaster. Here, we present DeepFly3D, a software that infers the 3D pose of tethered, adult Drosophila using multiple camera images. DeepFly3D does not require manual calibration, uses pictorial structures to automatically detect and correct pose estimation errors, and uses active learning to iteratively improve performance. We demonstrate more accurate unsupervised behavioral embedding using 3D joint angles rather than commonly used 2D pose data. Thus, DeepFly3D enables the automated acquisition of Drosophila behavioral measurements at an unprecedented level of detail for a variety of biological applications.
Collapse
|
24
|
A Performance Evaluation of Local Features for Image-Based 3D Reconstruction. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:4774-4789. [PMID: 30969920 DOI: 10.1109/tip.2019.2909640] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
This paper performs a comprehensive and comparative evaluation of the state-of-the-art local features for the task of image-based 3D reconstruction. The evaluated local features cover the recently developed ones by using powerful machine learning techniques and the elaborately designed handcrafted features. To obtain a comprehensive evaluation, we choose to include both float type features and binary ones. Meanwhile, two kinds of datasets have been used in this evaluation. One is a dataset of many different scene types with groundtruth 3D points, containing images of different scenes captured at fixed positions, for quantitative performance evaluation of different local features in the controlled image capturing situation. The other dataset contains Internet scale image sets of several landmarks with a lot of unrelated images, which is used for qualitative performance evaluation of different local features in the free image collection situation. Our experimental results show that binary features are competent to reconstruct scenes from controlled image sequences with only a fraction of processing time compared to using float type features. However, for the case of a large scale image set with many distracting images, float type features show a clear advantage over binary ones. Currently, the most traditional SIFT is very stable with regard to scene types in this specific task and produces very competitive reconstruction results among all the evaluated local features. Meanwhile, although the learned binary features are not as competitive as the handcrafted ones, learning float type features with CNN is promising but still requires much effort in the future.
Collapse
|
25
|
Mo 2Cap 2: Real-time Mobile 3D Motion Capture with a Cap-mounted Fisheye Camera. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2019; 25:2093-2101. [PMID: 30794176 DOI: 10.1109/tvcg.2019.2898650] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
We propose the first real-time system for the egocentric estimation of 3D human body pose in a wide range of unconstrained everyday activities. This setting has a unique set of challenges, such as mobility of the hardware setup, and robustness to long capture sessions with fast recovery from tracking failures. We tackle these challenges based on a novel lightweight setup that converts a standard baseball cap to a device for high-quality pose estimation based on a single cap-mounted fisheye camera. From the captured egocentric live stream, our CNN based 3D pose estimation approach runs at 60 Hz on a consumer-level GPU. In addition to the lightweight hardware setup, our other main contributions are: 1) a large ground truth training corpus of top-down fisheye images and 2) a disentangled 3D pose estimation approach that takes the unique properties of the egocentric viewpoint into account. As shown by our evaluation, we achieve lower 3D joint error as well as better 2D overlay than the existing baselines.
Collapse
|
26
|
Beyond Sharing Weights for Deep Domain Adaptation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2019; 41:801-814. [PMID: 29994060 DOI: 10.1109/tpami.2018.2814042] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
The performance of a classifier trained on data coming from a specific domain typically degrades when applied to a related but different one. While annotating many samples from the new domain would address this issue, it is often too expensive or impractical. Domain Adaptation has therefore emerged as a solution to this problem; It leverages annotated data from a source domain, in which it is abundant, to train a classifier to operate in a target domain, in which it is either sparse or even lacking altogether. In this context, the recent trend consists of learning deep architectures whose weights are shared for both domains, which essentially amounts to learning domain invariant features. Here, we show that it is more effective to explicitly model the shift from one domain to the other. To this end, we introduce a two-stream architecture, where one operates in the source domain and the other in the target domain. In contrast to other approaches, the weights in corresponding layers are related but not shared. We demonstrate that this both yields higher accuracy than state-of-the-art methods on several object recognition and detection tasks and consistently outperforms networks with shared weights in both supervised and unsupervised settings.
Collapse
|
27
|
The effects of aging on neuropil structure in mouse somatosensory cortex-A 3D electron microscopy analysis of layer 1. PLoS One 2018. [PMID: 29966021 DOI: 10.5061/dryad.bh78sn5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
This study has used dense reconstructions from serial EM images to compare the neuropil ultrastructure and connectivity of aged and adult mice. The analysis used models of axons, dendrites, and their synaptic connections, reconstructed from volumes of neuropil imaged in layer 1 of the somatosensory cortex. This shows the changes to neuropil structure that accompany a general loss of synapses in a well-defined brain region. The loss of excitatory synapses was balanced by an increase in their size such that the total amount of synaptic surface, per unit length of axon, and per unit volume of neuropil, stayed the same. There was also a greater reduction of inhibitory synapses than excitatory, particularly those found on dendritic spines, resulting in an increase in the excitatory/inhibitory balance. The close correlations, that exist in young and adult neurons, between spine volume, bouton volume, synaptic size, and docked vesicle numbers are all preserved during aging. These comparisons display features that indicate a reduced plasticity of cortical circuits, with fewer, more transient, connections, but nevertheless an enhancement of the remaining connectivity that compensates for a generalized synapse loss.
Collapse
|
28
|
The effects of aging on neuropil structure in mouse somatosensory cortex-A 3D electron microscopy analysis of layer 1. PLoS One 2018; 13:e0198131. [PMID: 29966021 PMCID: PMC6028106 DOI: 10.1371/journal.pone.0198131] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Accepted: 05/14/2018] [Indexed: 11/19/2022] Open
Abstract
This study has used dense reconstructions from serial EM images to compare the neuropil ultrastructure and connectivity of aged and adult mice. The analysis used models of axons, dendrites, and their synaptic connections, reconstructed from volumes of neuropil imaged in layer 1 of the somatosensory cortex. This shows the changes to neuropil structure that accompany a general loss of synapses in a well-defined brain region. The loss of excitatory synapses was balanced by an increase in their size such that the total amount of synaptic surface, per unit length of axon, and per unit volume of neuropil, stayed the same. There was also a greater reduction of inhibitory synapses than excitatory, particularly those found on dendritic spines, resulting in an increase in the excitatory/inhibitory balance. The close correlations, that exist in young and adult neurons, between spine volume, bouton volume, synaptic size, and docked vesicle numbers are all preserved during aging. These comparisons display features that indicate a reduced plasticity of cortical circuits, with fewer, more transient, connections, but nevertheless an enhancement of the remaining connectivity that compensates for a generalized synapse loss.
Collapse
|
29
|
Robust 3D Object Tracking from Monocular Images Using Stable Parts. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2018; 40:1465-1479. [PMID: 28574342 DOI: 10.1109/tpami.2017.2708711] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
We present an algorithm for estimating the pose of a rigid object in real-time under challenging conditions. Our method effectively handles poorly textured objects in cluttered, changing environments, even when their appearance is corrupted by large occlusions, and it relies on grayscale images to handle metallic environments on which depth cameras would fail. As a result, our method is suitable for practical Augmented Reality applications including industrial environments. At the core of our approach is a novel representation for the 3D pose of object parts: We predict the 3D pose of each part in the form of the 2D projections of a few control points. The advantages of this representation is three-fold: We can predict the 3D pose of the object even when only one part is visible; when several parts are visible, we can easily combine them to compute a better pose of the object; the 3D pose we obtain is usually very accurate, even when only few parts are visible. We show how to use this representation in a robust 3D tracking framework. In addition to extensive comparisons with the state-of-the-art, we demonstrate our method on a practical Augmented Reality application for maintenance assistance in the ATLAS particle detector at CERN.
Collapse
|
30
|
Reconstructing Evolving Tree Structures in Time Lapse Sequences by Enforcing Time-Consistency. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2018; 40:755-761. [PMID: 28333621 DOI: 10.1109/tpami.2017.2680444] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
We propose a novel approach to reconstructing curvilinear tree structures evolving over time, such as road networks in 2D aerial images or neural structures in 3D microscopy stacks acquired in vivo. To enforce temporal consistency, we simultaneously process all images in a sequence, as opposed to reconstructing structures of interest in each image independently. We formulate the problem as a Quadratic Mixed Integer Program and demonstrate the additional robustness that comes from using all available visual clues at once, instead of working frame by frame. Furthermore, when the linear structures undergo local changes over time, our approach automatically detects them.
Collapse
|
31
|
Unsupervised Geometry-Aware Representation for 3D Human Pose Estimation. COMPUTER VISION – ECCV 2018 2018. [DOI: 10.1007/978-3-030-01249-6_46] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
|
32
|
Geometric Graph Matching Using Monte Carlo Tree Search. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2017; 39:2171-2185. [PMID: 28114003 DOI: 10.1109/tpami.2016.2636200] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
We present an efficient matching method for generalized geometric graphs. Such graphs consist of vertices in space connected by curves and can represent many real world structures such as road networks in remote sensing, or vessel networks in medical imaging. Graph matching can be used for very fast and possibly multimodal registration of images of these structures. We formulate the matching problem as a single player game solved using Monte Carlo Tree Search, which automatically balances exploring new possible matches and extending existing matches. Our method can handle partial matches, topological differences, geometrical distortion, does not use appearance information and does not require an initial alignment. Moreover, our method is very efficient-it can match graphs with thousands of nodes, which is an order of magnitude better than the best competing method, and the matching only takes a few seconds.
Collapse
|
33
|
Stereo-vision three-dimensional reconstruction of curvilinear structures imaged with a TEM. Ultramicroscopy 2017; 184:116-124. [PMID: 28888106 DOI: 10.1016/j.ultramic.2017.08.010] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2017] [Revised: 07/01/2017] [Accepted: 08/20/2017] [Indexed: 11/28/2022]
Abstract
Deriving accurate three-dimensional (3-D) structural information of materials at the nanometre level is often crucial for understanding their properties. Tomography in transmission electron microscopy (TEM) is a powerful technique that provides such information. It is however demanding and sometimes inapplicable, as it requires the acquisition of multiple images within a large tilt arc and hence prolonged exposure to electrons. In some cases, prior knowledge about the structure can tremendously simplify the 3-D reconstruction if incorporated adequately. Here, a novel algorithm is presented that is able to produce a full 3-D reconstruction of curvilinear structures from stereo pair of TEM images acquired within a small tilt range that spans from only a few to tens of degrees. Reliability of the algorithm is demonstrated through reconstruction of a model 3-D object from its simulated projections, and is compared with that of conventional tomography. This method is experimentally demonstrated for the 3-D visualization of dislocation arrangements in a deformed metallic micro-pillar.
Collapse
|
34
|
Tilt-less 3-D electron imaging and reconstruction of complex curvilinear structures. Sci Rep 2017; 7:10630. [PMID: 28878280 PMCID: PMC5587565 DOI: 10.1038/s41598-017-07537-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2016] [Accepted: 06/29/2017] [Indexed: 11/13/2022] Open
Abstract
The ability to obtain three-dimensional (3-D) information about morphologies of nanostructures elucidates many interesting properties of materials in both physical and biological sciences. Here we demonstrate a novel method in scanning transmission electron microscopy (STEM) that gives a fast and reliable assessment of the 3-D configuration of curvilinear nanostructures, all without needing to tilt the sample through an arc. Using one-dimensional crystalline defects known as dislocations as a prototypical example of a complex curvilinear object, we demonstrate their 3-D reconstruction two orders of magnitude faster than by standard tilt-arc TEM tomographic techniques, from data recorded by selecting different ray paths of the convergent STEM probe. Due to its speed and immunity to problems associated with a tilt arc, the tilt-less 3-D imaging offers important advantages for investigations of radiation-sensitive, polycrystalline, or magnetic materials. Further, by using a segmented detector, the total electron dose is reduced to a single STEM raster scan acquisition; our tilt-less approach will therefore open new avenues for real-time 3-D electron imaging of dynamic processes.
Collapse
|
35
|
Detecting Flying Objects Using a Single Moving Camera. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2017; 39:879-892. [PMID: 28113698 DOI: 10.1109/tpami.2016.2564408] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
We propose an approach for detecting flying objects such as Unmanned Aerial Vehicles (UAVs) and aircrafts when they occupy a small portion of the field of view, possibly moving against complex backgrounds, and are filmed by a camera that itself moves. We argue that solving such a difficult problem requires combining both appearance and motion cues. To this end we propose a regression-based approach for object-centric motion stabilization of image patches that allows us to achieve effective classification on spatio-temporal image cubes and outperform state-of-the-art techniques. As this problem has not yet been extensively studied, no test datasets are publicly available. We therefore built our own, both for UAVs and aircrafts, and will make them publicly available so they can be used to benchmark future flying object detection and collision avoidance algorithms.
Collapse
|
36
|
Network Flow Integer Programming to Track Elliptical Cells in Time-Lapse Sequences. IEEE TRANSACTIONS ON MEDICAL IMAGING 2017; 36:942-951. [PMID: 28029619 DOI: 10.1109/tmi.2016.2640859] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
We propose a novel approach to automatically tracking elliptical cell populations in time-lapse image sequences. Given an initial segmentation, we account for partial occlusions and overlaps by generating an over-complete set of competing detection hypotheses. To this end, we fit ellipses to portions of the initial regions and build a hierarchy of ellipses, which are then treated as cell candidates. We then select temporally consistent ones by solving to optimality an integer program with only one type of flow variables. This eliminates the need for heuristics to handle missed detections due to partial occlusions and complex morphology. We demonstrate the effectiveness of our approach on a range of challenging sequences consisting of clumped cells and show that it outperforms state-of-the-art techniques.
Collapse
|
37
|
Reconstructing Curvilinear Networks Using Path Classifiers and Integer Programming. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2016; 38:2515-2530. [PMID: 26891482 DOI: 10.1109/tpami.2016.2519025] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
We propose a novel approach to automated delineation of curvilinear structures that form complex and potentially loopy networks. By representing the image data as a graph of potential paths, we first show how to weight these paths using discriminatively-trained classifiers that are both robust and generic enough to be applied to very different imaging modalities. We then present an Integer Programming approach to finding the optimal subset of paths, subject to structural and topological constraints that eliminate implausible solutions. Unlike earlier approaches that assume a tree topology for the networks, ours explicitly models the fact that the networks may contain loops, and can reconstruct both cyclic and acyclic ones. We demonstrate the effectiveness of our approach on a variety of challenging datasets including aerial images of road networks and micrographs of neural arbors, and show that it outperforms state-of-the-art techniques.
Collapse
|
38
|
Tracking Interacting Objects Using Intertwined Flows. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2016; 38:2312-2326. [PMID: 26731639 DOI: 10.1109/tpami.2015.2513406] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this paper, we show that tracking different kinds of interacting objects can be formulated as a network-flow mixed integer program. This is made possible by tracking all objects simultaneously using intertwined flow variables and expressing the fact that one object can appear or disappear at locations where another is in terms of linear flow constraints. Our proposed method is able to track invisible objects whose only evidence is the presence of other objects that contain them. Furthermore, our tracklet-based implementation yields real-time tracking performance. We demonstrate the power of our approach on scenes involving cars and pedestrians, bags being carried and dropped by people, and balls being passed from one player to the next in team sports. In particular, we show that by estimating jointly and globally the trajectories of different types of objects, the presence of the ones which were not initially detected based solely on image evidence can be inferred from the detections of the others.
Collapse
|
39
|
Simultaneous segmentation and anatomical labeling of the cerebral vasculature. Med Image Anal 2016; 32:201-15. [DOI: 10.1016/j.media.2016.03.006] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2015] [Revised: 01/20/2016] [Accepted: 03/16/2016] [Indexed: 11/24/2022]
|
40
|
Abstract
To provide efficient tools for the capture and modeling of acceptable virtual human poses, we propose a method for constraining the underlying joint structures based on real life data. Current tools for delimiting valid postures often employ techniques that do not represent joint limits in an intuitively satisfying manner, and furthermore are seldom directly derived from experimental data. Here, we propose a semi-automatic scheme for determining ball-and-socket joint limits by actual measurement and we apply it to modeling the shoulder complex, which—along with the hip complex—can be approximated by a three-degree-of-freedom ball-and-socket joint. Our first step is to measure the joint motion range using optical motion capture. We next convert the recorded values to joint poses encoded using a coherent quaternion field representation of the joint orientation space. Finally, we obtain a closed, continuous implicit surface approximation for the quaternion orientation-space boundary whose interior represents the complete space of valid orientations, enabling us to project invalid postures to the closest valid ones.
Collapse
|
41
|
Multiscale Centerline Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2016; 38:1327-1341. [PMID: 27295457 DOI: 10.1109/tpami.2015.2462363] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Finding the centerline and estimating the radius of linear structures is a critical first step in many applications, ranging from road delineation in 2D aerial images to modeling blood vessels, lung bronchi, and dendritic arbors in 3D biomedical image stacks. Existing techniques rely either on filters designed to respond to ideal cylindrical structures or on classification techniques. The former tend to become unreliable when the linear structures are very irregular while the latter often has difficulties distinguishing centerline locations from neighboring ones, thus losing accuracy. We solve this problem by reformulating centerline detection in terms of a regression problem. We first train regressors to return the distances to the closest centerline in scale-space, and we apply them to the input images or volumes. The centerlines and the corresponding scale then correspond to the regressors local maxima, which can be easily identified. We show that our method outperforms state-of-the-art techniques for various 2D and 3D datasets. Moreover, our approach is very generic and also performs well on contour detection. We show an improvement above recent contour detection algorithms on the BSDS500 dataset.
Collapse
|
42
|
|
43
|
Computer vision profiling of neurite outgrowth dynamics reveals spatiotemporal modularity of Rho GTPase signaling. J Cell Biol 2016; 212:91-111. [PMID: 26728857 PMCID: PMC4700477 DOI: 10.1083/jcb.201506018] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
NeuriteTracker is a computer vision approach used to analyze neuronal morphodynamics and to examine spatiotemporal Rho GTPase signaling networks regulating neurite outgrowth. Rho guanosine triphosphatases (GTPases) control the cytoskeletal dynamics that power neurite outgrowth. This process consists of dynamic neurite initiation, elongation, retraction, and branching cycles that are likely to be regulated by specific spatiotemporal signaling networks, which cannot be resolved with static, steady-state assays. We present NeuriteTracker, a computer-vision approach to automatically segment and track neuronal morphodynamics in time-lapse datasets. Feature extraction then quantifies dynamic neurite outgrowth phenotypes. We identify a set of stereotypic neurite outgrowth morphodynamic behaviors in a cultured neuronal cell system. Systematic RNA interference perturbation of a Rho GTPase interactome consisting of 219 proteins reveals a limited set of morphodynamic phenotypes. As proof of concept, we show that loss of function of two distinct RhoA-specific GTPase-activating proteins (GAPs) leads to opposite neurite outgrowth phenotypes. Imaging of RhoA activation dynamics indicates that both GAPs regulate different spatiotemporal Rho GTPase pools, with distinct functions. Our results provide a starting point to dissect spatiotemporal Rho GTPase signaling networks that regulate neurite outgrowth.
Collapse
|
44
|
Computer vision profiling of neurite outgrowth dynamics reveals spatiotemporal modularity of Rho GTPase signaling. J Exp Med 2016. [DOI: 10.1084/jem.2131oia128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
|
45
|
Template-Based Monocular 3D Shape Recovery Using Laplacian Meshes. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2016; 38:172-187. [PMID: 26656585 DOI: 10.1109/tpami.2015.2435739] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
We show that by extending the Laplacian formalism, which was first introduced in the Graphics community to regularize 3D meshes, we can turn the monocular 3D shape reconstruction of a deformable surface given correspondences with a reference image into a much better-posed problem. This allows us to quickly and reliably eliminate outliers by simply solving a linear least squares problem. This yields an initial 3D shape estimate, which is not necessarily accurate, but whose 2D projections are. The initial shape is then refined by a constrained optimization problem to output the final surface reconstruction. Our approach allows us to reduce the dimensionality of the surface reconstruction problem without sacrificing accuracy, thus allowing for real-time implementations.
Collapse
|
46
|
|
47
|
Live texturing of augmented reality characters from colored drawings. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2015; 21:1201-1210. [PMID: 26340776 DOI: 10.1109/tvcg.2015.2459871] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Coloring books capture the imagination of children and provide them with one of their earliest opportunities for creative expression. However, given the proliferation and popularity of digital devices, real-world activities like coloring can seem unexciting, and children become less engaged in them. Augmented reality holds unique potential to impact this situation by providing a bridge between real-world activities and digital enhancements. In this paper, we present an augmented reality coloring book App in which children color characters in a printed coloring book and inspect their work using a mobile device. The drawing is detected and tracked, and the video stream is augmented with an animated 3-D version of the character that is textured according to the child's coloring. This is possible thanks to several novel technical contributions. We present a texturing process that applies the captured texture from a 2-D colored drawing to both the visible and occluded regions of a 3-D character in real time. We develop a deformable surface tracking method designed for colored drawings that uses a new outlier rejection algorithm for real-time tracking and surface deformation recovery. We present a content creation pipeline to efficiently create the 2-D and 3-D content. And, finally, we validate our work with two user studies that examine the quality of our texturing algorithm and the overall App experience.
Collapse
|
48
|
Abstract
Electron and light microscopy imaging can now deliver high-quality image stacks of neural structures. However, the amount of human annotation effort required to analyze them remains a major bottleneck. While machine learning algorithms can be used to help automate this process, they require training data, which is time-consuming to obtain manually, especially in image stacks. Furthermore, due to changing experimental conditions, successive stacks often exhibit differences that are severe enough to make it difficult to use a classifier trained for a specific one on another. This means that this tedious annotation process has to be repeated for each new stack. In this paper, we present a domain adaptation algorithm that addresses this issue by effectively leveraging labeled examples across different acquisitions and significantly reducing the annotation requirements. Our approach can handle complex, nonlinear image feature transformations and scales to large microscopy datasets that often involve high-dimensional feature spaces and large 3D data volumes. We evaluate our approach on four challenging electron and light microscopy applications that exhibit very different image modalities and where annotation is very costly. Across all applications we achieve a significant improvement over the state-of-the-art machine learning methods and demonstrate our ability to greatly reduce human annotation effort.
Collapse
|
49
|
Learning structured models for segmentation of 2-D and 3-D imagery. IEEE TRANSACTIONS ON MEDICAL IMAGING 2015; 34:1096-1110. [PMID: 25438309 DOI: 10.1109/tmi.2014.2376274] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Efficient and accurate segmentation of cellular structures in microscopic data is an essential task in medical imaging. Many state-of-the-art approaches to image segmentation use structured models whose parameters must be carefully chosen for optimal performance. A popular choice is to learn them using a large-margin framework and more specifically structured support vector machines (SSVM). Although SSVMs are appealing, they suffer from certain limitations. First, they are restricted in practice to linear kernels because the more powerful nonlinear kernels cause the learning to become prohibitively expensive. Second, they require iteratively finding the most violated constraints, which is often intractable for the loopy graphical models used in image segmentation. This requires approximation that can lead to reduced quality of learning. In this paper, we propose three novel techniques to overcome these limitations. We first introduce a method to "kernelize" the features so that a linear SSVM framework can leverage the power of nonlinear kernels without incurring much additional computational cost. Moreover, we employ a working set of constraints to increase the reliability of approximate subgradient methods and introduce a new way to select a suitable step size at each iteration. We demonstrate the strength of our approach on both 2-D and 3-D electron microscopic (EM) image data and show consistent performance improvement over state-of-the-art approaches.
Collapse
|
50
|
Abstract
If we are ever to unravel the mysteries of brain function at its most fundamental level, we will need a precise understanding of how its component neurons connect to each other. Electron Microscopes (EM) can now provide the nanometer resolution that is needed to image synapses, and therefore connections, while Light Microscopes (LM) see at the micrometer resolution required to model the 3D structure of the dendritic network. Since both the topology and the connection strength are integral parts of the brain's wiring diagram, being able to combine these two modalities is critically important. In fact, these microscopes now routinely produce high-resolution imagery in such large quantities that the bottleneck becomes automated processing and interpretation, which is needed for such data to be exploited to its full potential. In this paper, we briefly review the Computer Vision techniques we have developed at EPFL to address this need. They include delineating dendritic arbors from LM imagery, segmenting organelles from EM, and combining the two into a consistent representation.
Collapse
|