1
|
Liu Y, Wornell GW, Freeman WT, Durand F. Imaging privacy threats from an ambient light sensor. Sci Adv 2024; 10:eadj3608. [PMID: 38198551 PMCID: PMC10780887 DOI: 10.1126/sciadv.adj3608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 12/14/2023] [Indexed: 01/12/2024]
Abstract
Embedded sensors in smart devices pose privacy risks, often unintentionally leaking user information. We investigate how combining an ambient light sensor with a device display can capture an image of touch interaction without a camera. By displaying a known video sequence, we use the light sensor to capture reflected light intensity variations partially blocked by the touching hand, formulating an inverse problem similar to single-pixel imaging. Because of the sensors' heavy quantization and low sensitivity, we propose an inversion algorithm involving an ℓp-norm dequantizer and a deep denoiser as natural image priors, to reconstruct images from the screen's perspective. We demonstrate touch interactions and eavesdropping hand gestures on an off-the-shelf Android tablet. Despite limitations in resolution and speed, we aim to raise awareness of potential security/privacy threats induced by the combination of passive and active components in smart devices and promote the development of ways to mitigate them.
Collapse
Affiliation(s)
- Yang Liu
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Gregory W. Wornell
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - William T. Freeman
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Frédo Durand
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|
2
|
Li Z, Dekel T, Cole F, Tucker R, Snavely N, Liu C, Freeman WT. MannequinChallenge: Learning the Depths of Moving People by Watching Frozen People. IEEE Trans Pattern Anal Mach Intell 2021; 43:4229-4241. [PMID: 32078534 DOI: 10.1109/tpami.2020.2974454] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We present a method for predicting dense depth in scenarios where both a monocular camera and people in the scene are freely moving (right). Existing methods for recovering depth for dynamic, non-rigid objects from monocular video impose strong assumptions on the objects' motion and may only recover sparse depth. In this paper, we take a data-driven approach and learn human depth priors from a new source of data: thousands of Internet videos of people imitating mannequins, i.e., freezing in diverse, natural poses, while a hand-held camera tours the scene (left). Because people are stationary, geometric constraints hold, thus training data can be generated using multi-view stereo reconstruction. At inference time, our method uses motion parallax cues from the static areas of the scenes to guide the depth prediction. We evaluate our method on real-world sequences of complex human actions captured by a moving hand-held camera, show improvement over state-of-the-art monocular depth prediction methods, and demonstrate various 3D effects produced using our predicted depth.
Collapse
|
3
|
Xue T, Wu J, Bouman KL, Freeman WT. Visual Dynamics: Stochastic Future Generation via Layered Cross Convolutional Networks. IEEE Trans Pattern Anal Mach Intell 2019; 41:2236-2250. [PMID: 30004870 DOI: 10.1109/tpami.2018.2854726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
We study the problem of synthesizing a number of likely future frames from a single input image. In contrast to traditional methods that have tackled this problem in a deterministic or non-parametric way, we propose to model future frames in a probabilistic manner. Our probabilistic model makes it possible for us to sample and synthesize many possible future frames from a single input image. To synthesize realistic movement of objects, we propose a novel network structure, namely a Cross Convolutional Network; this network encodes image and motion information as feature maps and convolutional kernels, respectively. In experiments, our model performs well on synthetic data, such as 2D shapes and animated game sprites, and on real-world video frames. We present analyses of the learned network representations, showing it is implicitly learning a compact encoding of object appearance and motion. We also demonstrate a few of its applications, including visual analogy-making and video extrapolation.
Collapse
|
4
|
Dalca AV, Bouman KL, Freeman WT, Rost NS, Sabuncu MR, Golland P. Medical Image Imputation from Image Collections. IEEE Trans Med Imaging 2018; 38:10.1109/TMI.2018.2866692. [PMID: 30136936 PMCID: PMC6393212 DOI: 10.1109/tmi.2018.2866692] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
We present an algorithm for creating high resolution anatomically plausible images consistent with acquired clinical brain MRI scans with large inter-slice spacing. Although large data sets of clinical images contain a wealth of information, time constraints during acquisition result in sparse scans that fail to capture much of the anatomy. These characteristics often render computational analysis impractical as many image analysis algorithms tend to fail when applied to such images. Highly specialized algorithms that explicitly handle sparse slice spacing do not generalize well across problem domains. In contrast, we aim to enable application of existing algorithms that were originally developed for high resolution research scans to significantly undersampled scans. We introduce a generative model that captures fine-scale anatomical structure across subjects in clinical image collections and derive an algorithm for filling in the missing data in scans with large inter-slice spacing. Our experimental results demonstrate that the resulting method outperforms state-of-the-art upsampling super-resolution techniques, and promises to facilitate subsequent analysis not previously possible with scans of this quality. Our implementation is freely available at https://github.com/adalca/papago.
Collapse
Affiliation(s)
- Adrian V. Dalca
- Computer Science and Artificial Intelligence Lab, MIT (main contact: ) and also Martinos Center for Biomedical Imaging, Massachusetts General Hospital, HMS
| | | | | | - Natalia S. Rost
- Department of Neurology, Massachusetts General Hospital, HMS
| | - Mert R. Sabuncu
- School of Electrical and Computer Engineering, and Meinig School of Biomedical Engineering, Cornell University
| | | |
Collapse
|
5
|
Oron S, Dekel T, Xue T, Freeman WT, Avidan S. Best-Buddies Similarity-Robust Template Matching Using Mutual Nearest Neighbors. IEEE Trans Pattern Anal Mach Intell 2018; 40:1799-1813. [PMID: 28796608 DOI: 10.1109/tpami.2017.2737424] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
We propose a novel method for template matching in unconstrained environments. Its essence is the Best-Buddies Similarity (BBS), a useful, robust, and parameter-free similarity measure between two sets of points. BBS is based on counting the number of Best-Buddies Pairs (BBPs)-pairs of points in source and target sets that are mutual nearest neighbours, i.e., each point is the nearest neighbour of the other. BBS has several key features that make it robust against complex geometric deformations and high levels of outliers, such as those arising from background clutter and occlusions. We study these properties, provide a statistical analysis that justifies them, and demonstrate the consistent success of BBS on a challenging real-world dataset while using different types of features.
Collapse
|
6
|
Wu J, Xue T, Lim JJ, Tian Y, Tenenbaum JB, Torralba A, Freeman WT. 3D Interpreter Networks for Viewer-Centered Wireframe Modeling. Int J Comput Vis 2018. [DOI: 10.1007/s11263-018-1074-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
7
|
Abstract
We present an algorithm for creating high resolution anatomically plausible images consistent with acquired clinical brain MRI scans with large inter-slice spacing. Although large databases of clinical images contain a wealth of information, medical acquisition constraints result in sparse scans that miss much of the anatomy. These characteristics often render computational analysis impractical as standard processing algorithms tend to fail when applied to such images. Highly specialized or application-specific algorithms that explicitly handle sparse slice spacing do not generalize well across problem domains. In contrast, our goal is to enable application of existing algorithms that were originally developed for high resolution research scans to significantly undersampled scans. We introduce a model that captures fine-scale anatomical similarity across subjects in clinical image collections and use it to fill in the missing data in scans with large slice spacing. Our experimental results demonstrate that the proposed method outperforms current upsampling methods and promises to facilitate subsequent analysis not previously possible with scans of this quality.
Collapse
Affiliation(s)
- Adrian V Dalca
- Computer Science and Artificial Intelligence Lab, MIT, Cambridge, USA
- Martinos Center for Biomedical Imaging, Massachusetts General Hospital, HMS, Charlestown, MA, USA
| | | | - William T Freeman
- Computer Science and Artificial Intelligence Lab, MIT, Cambridge, USA
- Google Research, Cambridge, MA, USA
| | - Natalia S Rost
- Department of Neurology, Massachusetts General Hospital, HMS, Boston, USA
| | - Mert R Sabuncu
- Martinos Center for Biomedical Imaging, Massachusetts General Hospital, HMS, Charlestown, MA, USA
- School of Electrical and Computer Engineering, Cornell, Ithaca, USA
| | - Polina Golland
- Computer Science and Artificial Intelligence Lab, MIT, Cambridge, USA
| |
Collapse
|
8
|
Davis A, Bouman KL, Chen JG, Rubinstein M, Buyukozturk O, Durand F, Freeman WT. Visual Vibrometry: Estimating Material Properties from Small Motions in Video. IEEE Trans Pattern Anal Mach Intell 2017; 39:732-745. [PMID: 27875214 DOI: 10.1109/tpami.2016.2622271] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The estimation of material properties is important for scene understanding, with many applications in vision, robotics, and structural engineering. This paper connects fundamentals of vibration mechanics with computer vision techniques in order to infer material properties from small, often imperceptible motions in video. Objects tend to vibrate in a set of preferred modes. The frequencies of these modes depend on the structure and material properties of an object. We show that by extracting these frequencies from video of a vibrating object, we can often make inferences about that object's material properties. We demonstrate our approach by estimating material properties for a variety of objects by observing their motion in high-speed and regular frame rate video.
Collapse
|
9
|
Freeman WT, Szeliski R, Hager GD. Guest Editorial: Special Section on CVPR 2013. IEEE Trans Pattern Anal Mach Intell 2016; 38:625-626. [PMID: 27403473 DOI: 10.1109/tpami.2016.2529898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
|
10
|
Fish VL, Johnson MD, Lu RS, Doeleman SS, Bouman KL, Zoran D, Freeman WT, Psaltis D, Narayan R, Pankratius V, Broderick AE, Gwinn CR, Vertatschitsch LE. IMAGING AN EVENT HORIZON: MITIGATION OF SCATTERING TOWARD SAGITTARIUS A*. ACTA ACUST UNITED AC 2014. [DOI: 10.1088/0004-637x/795/2/134] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
11
|
|
12
|
Cho TS, Zitnick CL, Joshi N, Kang SB, Szeliski R, Freeman WT. Image restoration by matching gradient distributions. IEEE Trans Pattern Anal Mach Intell 2012; 34:683-694. [PMID: 21844632 DOI: 10.1109/tpami.2011.166] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
The restoration of a blurry or noisy image is commonly performed with a MAP estimator, which maximizes a posterior probability to reconstruct a clean image from a degraded image. A MAP estimator, when used with a sparse gradient image prior, reconstructs piecewise smooth images and typically removes textures that are important for visual realism. We present an alternative deconvolution method called iterative distribution reweighting (IDR) which imposes a global constraint on gradients so that a reconstructed image should have a gradient distribution similar to a reference distribution. In natural images, a reference distribution not only varies from one image to another, but also within an image depending on texture. We estimate a reference distribution directly from an input image for each texture segment. Our algorithm is able to restore rich mid-frequency textures. A large-scale user study supports the conclusion that our algorithm improves the visual realism of reconstructed images compared to those of MAP estimators.
Collapse
Affiliation(s)
- Taeg Sang Cho
- WilmerHale, LLP, 60 State Street, Boston, MA 02139, USA.
| | | | | | | | | | | |
Collapse
|
13
|
Rubinstein M, Liu C, Freeman WT. Annotation Propagation in Large Image Databases via Dense Image Correspondence. Computer Vision – ECCV 2012 2012. [DOI: 10.1007/978-3-642-33712-3_7] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
14
|
Levin A, Weiss Y, Durand F, Freeman WT. Understanding Blind Deconvolution Algorithms. IEEE Trans Pattern Anal Mach Intell 2011; 33:2354-2367. [PMID: 21788664 DOI: 10.1109/tpami.2011.148] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Blind deconvolution is the recovery of a sharp version of a blurred image when the blur kernel is unknown. Recent algorithms have afforded dramatic progress, yet many aspects of the problem remain challenging and hard to understand. The goal of this paper is to analyze and evaluate recent blind deconvolution algorithms both theoretically and experimentally. We explain the previously reported failure of the naive MAP approach by demonstrating that it mostly favors no-blur explanations. We show that, using reasonable image priors, a naive simulations MAP estimation of both latent image and blur kernel is guaranteed to fail even with infinitely large images sampled from the prior. On the other hand, we show that since the kernel size is often smaller than the image size, a MAP estimation of the kernel alone is well constrained and is guaranteed to succeed to recover the true blur. The plethora of recent deconvolution techniques makes an experimental evaluation on ground-truth data important. As a first step toward this experimental evaluation, we have collected blur data with ground truth and compared recent algorithms under equal settings. Additionally, our data demonstrate that the shift-invariant blur assumption made by most algorithms is often violated.
Collapse
|
15
|
Johnson MK, Dale K, Avidan S, Pfister H, Freeman WT, Matusik W. CG2Real: Improving the Realism of Computer Generated Images Using a Large Collection of Photographs. IEEE Trans Vis Comput Graph 2011; 17:1273-1285. [PMID: 21041875 DOI: 10.1109/tvcg.2010.233] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Computer-generated (CG) images have achieved high levels of realism. This realism, however, comes at the cost of long and expensive manual modeling, and often humans can still distinguish between CG and real images. We introduce a new data-driven approach for rendering realistic imagery that uses a large collection of photographs gathered from online repositories. Given a CG image, we retrieve a small number of real images with similar global structure. We identify corresponding regions between the CG and real images using a mean-shift cosegmentation algorithm. The user can then automatically transfer color, tone, and texture from matching regions to the CG image. Our system only uses image processing operations and does not require a 3D model of the scene, making it fast and easy to integrate into digital content creation workflows. Results of a user study show that our hybrid images appear more realistic than the originals.
Collapse
|
16
|
Abstract
The patch transform represents an image as a bag of overlapping patches sampled on a regular grid. This representation allows users to manipulate images in the patch domain, which then seeds the inverse patch transform to synthesize modified images. Possible modifications include the spatial locations of patches, the size of the output image, or the pool of patches from which an image is reconstructed. When no modifications are made, the inverse patch transform reduces to solving a jigsaw puzzle. The inverse patch transform is posed as a patch assignment problem on a Markov random field (MRF), where each patch should be used only once and neighboring patches should fit to form a plausible image. We find an approximate solution to the MRF using loopy belief propagation, introducing an approximation that encourages the solution to use each patch only once. The image reconstruction algorithm scales well with the total number of patches through label pruning. In addition, structural misalignment artifacts are suppressed through a patch jittering scheme that spatially jitters the assigned patches. We demonstrate the patch transform and its effectiveness on natural images.
Collapse
Affiliation(s)
- Taeg Sang Cho
- CSAIL, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| | | | | |
Collapse
|
17
|
|
18
|
Torralba A, Fergus R, Freeman WT. 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans Pattern Anal Mach Intell 2008; 30:1958-1970. [PMID: 18787244 DOI: 10.1109/tpami.2008.128] [Citation(s) in RCA: 308] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
With the advent of the Internet, billions of images are now freely available online and constitute a dense sampling of the visual world. Using a variety of non-parametric methods, we explore this world with the aid of a large dataset of 79,302,017 images collected from the Internet. Motivated by psychophysical results showing the remarkable tolerance of the human visual system to degradations in image resolution, the images in the dataset are stored as 32 x 32 color images. Each image is loosely labeled with one of the 75,062 non-abstract nouns in English, as listed in the Wordnet lexical database. Hence the image database gives a comprehensive coverage of all object categories and scenes. The semantic information from Wordnet can be used in conjunction with nearest-neighbor methods to perform object classification over a range of semantic levels minimizing the effects of labeling noise. For certain classes that are particularly prevalent in the dataset, such as people, we are able to demonstrate a recognition performance comparable to class-specific Viola-Jones style detectors.
Collapse
Affiliation(s)
- Antonio Torralba
- Computer Science and Artificial Intelligence Lab (CSAIL), Massachusetts Institute of Technology, 32 Vassar Street, Cambridge, MA 02139, USA.
| | | | | |
Collapse
|
19
|
Liu C, Szeliski R, Bing Kang S, Zitnick CL, Freeman WT. Automatic estimation and removal of noise from a single image. IEEE Trans Pattern Anal Mach Intell 2008; 30:299-314. [PMID: 18084060 DOI: 10.1109/tpami.2007.1176] [Citation(s) in RCA: 84] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Image denoising algorithms often assume an additive white Gaussian noise (AWGN) process that is independent of the actual RGB values. Such approaches are not fully automatic and cannot effectively remove color noise produced by todays CCD digital camera. In this paper, we propose a unified framework for two tasks: automatic estimation and removal of color noise from a single image using piecewise smooth image models. We introduce the noise level function (NLF), which is a continuous function describing the noise level as a function of image brightness. We then estimate an upper bound of the real noise level function by fitting a lower envelope to the standard deviations of per-segment image variances. For denoising, the chrominance of color noise is significantly removed by projecting pixel values onto a line fit to the RGB values in each segment. Then, a Gaussian conditional random field (GCRF) is constructed to obtain the underlying clean image from the noisy input. Extensive experiments are conducted to test the proposed algorithm, which is shown to outperform state-of-the-art denoising algorithms.
Collapse
Affiliation(s)
- Ce Liu
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32 Vassar Street, Cambridge, MA 02139, USA.
| | | | | | | | | |
Collapse
|
20
|
|
21
|
|
22
|
Abstract
We consider the problem of detecting a large number of different classes of objects in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, at multiple locations and scales. This can be slow and can require a lot of training data since each classifier requires the computation of many different image features. In particular, for independently trained detectors, the (runtime) computational complexity and the (training-time) sample complexity scale linearly with the number of classes to be detected. We present a multitask learning procedure, based on boosted decision stumps, that reduces the computational and sample complexity by finding common features that can be shared across the classes (and/or views). The detectors for each class are trained jointly, rather than independently. For a given performance level, the total number of features required and, therefore, the runtime cost of the classifier, is observed to scale approximately logarithmically with the number of classes. The features selected by joint training are generic edge-like features, whereas the features chosen by training each class separately tend to be more object-specific. The generic features generalize better and considerably reduce the computational cost of multiclass object detection.
Collapse
Affiliation(s)
- Antonio Torralba
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| | | | | |
Collapse
|
23
|
Joshi N, Matusik W, Avidan S, Pfister H, Freeman WT. Exploring defocus matting: nonparametric acceleration, super-resolution, and off-center matting. IEEE Comput Graph Appl 2007; 27:43-52. [PMID: 17388202 DOI: 10.1109/mcg.2007.32] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
Defocus matting is a fully automatic and passive method for pulling mattes from video captured with coaxial cameras that have different depths of field and planes of focus. Nonparametric sampling can accelerate the video-matting process from minutes to seconds per frame. In addition a super-resolution technique efficiently bridges the gap between mattes from high-resolution video cameras and those from low-resolution cameras. Off-center matting pulls mattes for an external high-resolution camera that doesn't share the same center of projection as the low-resolution cameras used to capture the defocus matting data.
Collapse
Affiliation(s)
- Neel Joshi
- Computer Science and Engineering Department, University of California, San Diego, USA.
| | | | | | | | | |
Collapse
|
24
|
|
25
|
Brainard DH, Longère P, Delahunt PB, Freeman WT, Kraft JM, Xiao B. Bayesian model of human color constancy. J Vis 2006; 6:1267-81. [PMID: 17209734 PMCID: PMC2396883 DOI: 10.1167/6.11.10] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2005] [Accepted: 09/19/2006] [Indexed: 11/24/2022] Open
Abstract
Vision is difficult because images are ambiguous about the structure of the world. For object color, the ambiguity arises because the same object reflects a different spectrum to the eye under different illuminations. Human vision typically does a good job of resolving this ambiguity-an ability known as color constancy. The past 20 years have seen an explosion of work on color constancy, with advances in both experimental methods and computational algorithms. Here, we connect these two lines of research by developing a quantitative model of human color constancy. The model includes an explicit link between psychophysical data and illuminant estimates obtained via a Bayesian algorithm. The model is fit to the data through a parameterization of the prior distribution of illuminant spectral properties. The fit to the data is good, and the derived prior provides a succinct description of human performance.
Collapse
Affiliation(s)
- David H Brainard
- Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA.
| | | | | | | | | | | |
Collapse
|
26
|
Abstract
Interpreting real-world images requires the ability distinguish the different characteristics of the scene that lead to its final appearance. Two of the most important of these characteristics are the shading and reflectance of each point in the scene. We present an algorithm that uses multiple cues to recover shading and reflectance intrinsic images from a single image. Using both color information and a classifier trained to recognize gray-scale patterns, given the lighting direction, each image derivative is classified as being caused by shading or a change in the surface's reflectance. The classifiers gather local evidence about the surface's form and color, which is then propagated using the Generalized Belief Propagation algorithm. The propagation step disambiguates areas of the image where the correct classification is not clear from local evidence. We use real-world images to demonstrate results and show how each component of the system affects the results.
Collapse
Affiliation(s)
- Marshall F Tappen
- MIT Computer Science and Artificial Intelligence Laboratory, The Stata Center, Building 32, 32 Vassar Street, Cambridge, MA 02139, USA.
| | | | | |
Collapse
|
27
|
Abstract
Graphical models, such as Bayesian networks and Markov random fields, represent statistical dependencies of variables by a graph. Local "belief propagation" rules of the sort proposed by Pearl (1988) are guaranteed to converge to the correct posterior probabilities in singly connected graphs. Recently, good performance has been obtained by using these same rules on graphs with loops, a method we refer to as loopy belief propagation. Perhaps the most dramatic instance is the near Shannon-limit performance of "Turbo codes," whose decoding algorithm is equivalent to loopy propagation. Except for the case of graphs with a single loop, there has been little theoretical understanding of loopy propagation. Here we analyze belief propagation in networks with arbitrary topologies when the nodes in the graph describe jointly gaussian random variables. We give an analytical formula relating the true posterior probabilities with those calculated using loopy propagation. We give sufficient conditions for convergence and show that when belief propagation converges, it gives the correct posterior means for all graph topologies, not just networks with a single loop. These results motivate using the powerful belief propagation algorithm in a broader class of networks and help clarify the empirical performance results.
Collapse
Affiliation(s)
- Y Weiss
- Computer Science Division, University of California, Berkeley CA 94720-1776, USA
| | | |
Collapse
|
28
|
Abstract
Perceptual systems routinely separate "content" from "style," classifying familiar words spoken in an unfamiliar accent, identifying a font or handwriting style across letters, or recognizing a familiar face or object seen under unfamiliar viewing conditions. Yet a general and tractable computational model of this ability to untangle the underlying factors of perceptual observations remains elusive (Hofstadter, 1985). Existing factor models (Mardia, Kent, & Bibby, 1979; Hinton & Zemel, 1994; Ghahramani, 1995; Bell & Sejnowski, 1995; Hinton, Dayan, Frey, & Neal, 1995; Dayan, Hinton, Neal, & Zemel, 1995; Hinton & Ghahramani, 1997) are either insufficiently rich to capture the complex interactions of perceptually meaningful factors such as phoneme and speaker accent or letter and font, or do not allow efficient learning algorithms. We present a general framework for learning to solve two-factor tasks using bilinear models, which provide sufficiently expressive representations of factor interactions but can nonetheless be fit to data using efficient algorithms based on the singular value decomposition and expectation-maximization. We report promising results on three different tasks in three different perceptual domains: spoken vowel classification with a benchmark multi-speaker database, extrapolation of fonts to unseen letters, and translation of faces to novel illuminants.
Collapse
Affiliation(s)
- J B Tenenbaum
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge 02139, USA
| | | |
Collapse
|
29
|
|
30
|
Abstract
The problem of color constancy may be solved if we can recover the physical properties of illuminants and surfaces from photosensor responses. We consider this problem within the framework of Bayesian decision theory. First, we model the relation among illuminants, surfaces, and photosensor responses. Second, we construct prior distributions that describe the probability that particular illuminants and surfaces exist in the world. Given a set of photosensor responses, we can then use Bayes's rule to compute the posterior distribution for the illuminants and the surfaces in the scene. There are two widely used methods for obtaining a single best estimate from a posterior distribution. These are maximum a posteriori (MAP) and minimum mean-square-error (MMSE) estimation. We argue that neither is appropriate for perception problems. We describe a new estimator, which we call the maximum local mass (MLM) estimate, that integrates local probability density. The new method uses an optimality criterion that is appropriate for perception tasks: It finds the most probable approximately correct answer. For the case of low observation noise, we provide an efficient approximation. We develop the MLM estimator for the color-constancy problem in which flat matte surfaces are uniformly illuminated. In simulations we show that the MLM method performs better than the MAP estimator and better than a number of standard color-constancy algorithms. We note conditions under which even the optimal estimator produces poor estimates: when the spectral properties of the surfaces in the scene are biased.
Collapse
Affiliation(s)
- D H Brainard
- Department of Psychology, University of California, Santa Barbara 93106, USA
| | | |
Collapse
|
31
|
Abstract
A visual system makes assumptions in order to interpret visual data. The assumption of 'generic view' states that the observer is not in a special position relative to the scene. Researchers commonly use a binary decision of generic or accidental view to disqualify scene interpretations that assume accidental viewpoints. Here we show how to use the generic view assumption, and others like it, to quantify the likelihood of a view, adding a new term to the probability of a given image interpretation. The resulting framework better models the visual world and reduces the reliance on other prior assumptions. It may lead to computer vision algorithms of greater power and accuracy, or to better models of human vision. We show applications to the problems of inferring shape, surface reflectance properties, and motion from images.
Collapse
Affiliation(s)
- W T Freeman
- Mitsubishi Electric Research Laboratories, Cambridge, Massachusetts 02139
| |
Collapse
|
32
|
Abstract
We describe a technique for displaying patterns that appear to move continuously without changing their positions. The method uses a quadrature pair of oriented filters to vary the local phase, giving the sensation of motion. We have used this technique in various computer graphic and scientific visualization applications.
Collapse
Affiliation(s)
- William T. Freeman
- The Media Laboratory, Massachusetts Institute of Technology, Cambridge, MA
| | - Edward H. Adelson
- The Media Laboratory, Massachusetts Institute of Technology, Cambridge, MA
| | | |
Collapse
|
33
|
Freeman WT. Dermatitis Artefacta. Proc R Soc Med 1914; 7:55-56. [PMID: 19977721 PMCID: PMC2003989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
|
34
|
Freeman WT. Bromide Eruption. Proc R Soc Med 1914; 7:119-120. [PMID: 19977599 PMCID: PMC2003880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
|
35
|
Freeman WT. Case for Diagnosis. Proc R Soc Med 1908; 1:14. [PMID: 19972794 PMCID: PMC2046348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
|