1
|
In the Eye of Transformer: Global-Local Correlation for Egocentric Gaze Estimation and Beyond. Int J Comput Vis 2023; 132:854-871. [PMID: 38371492 PMCID: PMC10873248 DOI: 10.1007/s11263-023-01879-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Accepted: 08/10/2023] [Indexed: 02/20/2024]
Abstract
Predicting human's gaze from egocentric videos serves as a critical role for human intention understanding in daily activities. In this paper, we present the first transformer-based model to address the challenging problem of egocentric gaze estimation. We observe that the connection between the global scene context and local visual information is vital for localizing the gaze fixation from egocentric video frames. To this end, we design the transformer encoder to embed the global context as one additional visual token and further propose a novel global-local correlation module to explicitly model the correlation of the global token and each local token. We validate our model on two egocentric video datasets - EGTEA Gaze + and Ego4D. Our detailed ablation studies demonstrate the benefits of our method. In addition, our approach exceeds the previous state-of-the-art model by a large margin. We also apply our model to a novel gaze saccade/fixation prediction task and the traditional action recognition problem. The consistent gains suggest the strong generalization capability of our model. We also provide additional visualizations to support our claim that global-local correlation serves a key representation for predicting gaze fixation from egocentric videos. More details can be found in our website (https://bolinlai.github.io/GLC-EgoGazeEst).
Collapse
|
2
|
In the Eye of the Beholder: Gaze and Actions in First Person Video. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:6731-6747. [PMID: 33449877 DOI: 10.1109/tpami.2021.3051319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
We address the task of jointly determining what a person is doing and where they are looking based on the analysis of video captured by a headworn camera. To facilitate our research, we first introduce the EGTEA Gaze+ dataset. Our dataset comes with videos, gaze tracking data, hand masks and action annotations, thereby providing the most comprehensive benchmark for First Person Vision (FPV). Moving beyond the dataset, we propose a novel deep model for joint gaze estimation and action recognition in FPV. Our method describes the participant's gaze as a probabilistic variable and models its distribution using stochastic units in a deep network. We further sample from these stochastic units, generating an attention map to guide the aggregation of visual features for action recognition. Our method is evaluated on our EGTEA Gaze+ dataset and achieves a performance level that exceeds the state-of-the-art by a significant margin. More importantly, we demonstrate that our model can be applied to larger scale FPV dataset-EPIC-Kitchens even without using gaze, offering new state-of-the-art results on FPV action recognition.
Collapse
|
3
|
mRisk: Continuous Risk Estimation for Smoking Lapse from Noisy Sensor Data with Incomplete and Positive-Only Labels. PROCEEDINGS OF THE ACM ON INTERACTIVE, MOBILE, WEARABLE AND UBIQUITOUS TECHNOLOGIES 2022; 6:143. [PMID: 36873428 PMCID: PMC9979627 DOI: 10.1145/3550308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
Abstract
Passive detection of risk factors (that may influence unhealthy or adverse behaviors) via wearable and mobile sensors has created new opportunities to improve the effectiveness of behavioral interventions. A key goal is to find opportune moments for intervention by passively detecting rising risk of an imminent adverse behavior. But, it has been difficult due to substantial noise in the data collected by sensors in the natural environment and a lack of reliable label assignment of low- and high-risk states to the continuous stream of sensor data. In this paper, we propose an event-based encoding of sensor data to reduce the effect of noises and then present an approach to efficiently model the historical influence of recent and past sensor-derived contexts on the likelihood of an adverse behavior. Next, to circumvent the lack of any confirmed negative labels (i.e., time periods with no high-risk moment), and only a few positive labels (i.e., detected adverse behavior), we propose a new loss function. We use 1,012 days of sensor and self-report data collected from 92 participants in a smoking cessation field study to train deep learning models to produce a continuous risk estimate for the likelihood of an impending smoking lapse. The risk dynamics produced by the model show that risk peaks an average of 44 minutes before a lapse. Simulations on field study data show that using our model can create intervention opportunities for 85% of lapses with 5.5 interventions per day.
Collapse
|
4
|
The mobile assistance for regulating smoking (MARS) micro-randomized trial design protocol. Contemp Clin Trials 2021; 110:106513. [PMID: 34314855 PMCID: PMC8824313 DOI: 10.1016/j.cct.2021.106513] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 07/13/2021] [Accepted: 07/16/2021] [Indexed: 11/30/2022]
Abstract
Smoking is the leading preventable cause of death and disability in the U.S. Empirical evidence suggests that engaging in evidence-based self-regulatory strategies (e.g., behavioral substitution, mindful attention) can improve smokers' ability to resist craving and build self-regulatory skills. However, poor engagement represents a major barrier to maximizing the impact of self-regulatory strategies. This paper describes the protocol for Mobile Assistance for Regulating Smoking (MARS) - a research study designed to inform the development of a mobile health (mHealth) intervention for promoting real-time, real-world engagement in evidence-based self-regulatory strategies. The study will employ a 10-day Micro-Randomized Trial (MRT) enrolling 112 smokers attempting to quit. Utilizing a mobile smoking cessation app, the MRT will randomize each individual multiple times per day to either: (a) no intervention prompt; (b) a prompt recommending brief (low effort) cognitive and/or behavioral self-regulatory strategies; or (c) a prompt recommending more effortful cognitive or mindfulness-based strategies. Prompts will be delivered via push notifications from the MARS mobile app. The goal is to investigate whether, what type of, and under what conditions prompting the individual to engage in self-regulatory strategies increases engagement. The results will build the empirical foundation necessary to develop a mHealth intervention that effectively utilizes intensive longitudinal self-report and sensor-based assessments of emotions, context and other factors to engage an individual in the type of self-regulatory activity that would be most beneficial given their real-time, real-world circumstances. This type of mHealth intervention holds enormous potential to expand the reach and impact of smoking cessation treatments.
Collapse
|
5
|
Detection of eye contact with deep neural networks is as accurate as human experts. Nat Commun 2020; 11:6386. [PMID: 33318484 PMCID: PMC7736573 DOI: 10.1038/s41467-020-19712-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Accepted: 10/14/2020] [Indexed: 01/10/2023] Open
Abstract
Eye contact is among the most primary means of social communication used by humans. Quantification of eye contact is valuable as a part of the analysis of social roles and communication skills, and for clinical screening. Estimating a subject's looking direction is a challenging task, but eye contact can be effectively captured by a wearable point-of-view camera which provides a unique viewpoint. While moments of eye contact from this viewpoint can be hand-coded, such a process tends to be laborious and subjective. In this work, we develop a deep neural network model to automatically detect eye contact in egocentric video. It is the first to achieve accuracy equivalent to that of human experts. We train a deep convolutional network using a dataset of 4,339,879 annotated images, consisting of 103 subjects with diverse demographic backgrounds. 57 subjects have a diagnosis of Autism Spectrum Disorder. The network achieves overall precision of 0.936 and recall of 0.943 on 18 validation subjects, and its performance is on par with 10 trained human coders with a mean precision 0.918 and recall 0.946. Our method will be instrumental in gaze behavior analysis by serving as a scalable, objective, and accessible tool for clinicians and researchers.
Collapse
|
6
|
A Robust Functional EM Algorithm for Incomplete Panel Count Data. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 2020; 33:19828-19838. [PMID: 34103881 PMCID: PMC8182728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Panel count data describes aggregated counts of recurrent events observed at discrete time points. To understand dynamics of health behaviors and predict future negative events, the field of quantitative behavioral research has evolved to increasingly rely upon panel count data collected via multiple self reports, for example, about frequencies of smoking using in-the-moment surveys on mobile devices. However, missing reports are common and present a major barrier to downstream statistical learning. As a first step, under a missing completely at random assumption (MCAR), we propose a simple yet widely applicable functional EM algorithm to estimate the counting process mean function, which is of central interest to behavioral scientists. The proposed approach wraps several popular panel count inference methods, seamlessly deals with incomplete counts and is robust to misspecification of the Poisson process assumption. Theoretical analysis of the proposed algorithm provides finite-sample guarantees by expanding parametric EM theory [3, 34] to the general non-parametric setting. We illustrate the utility of the proposed algorithm through numerical experiments and an analysis of smoking cessation data. We also discuss useful extensions to address deviations from the MCAR assumption and covariate effects.
Collapse
|
7
|
Noninvasive three-state sleep-wake staging in mice using electric field sensors. J Neurosci Methods 2020; 344:108834. [PMID: 32619585 PMCID: PMC7454007 DOI: 10.1016/j.jneumeth.2020.108834] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Revised: 06/22/2020] [Accepted: 06/26/2020] [Indexed: 11/22/2022]
Abstract
STUDY OBJECTIVE Validate a novel method for sleep-wake staging in mice using noninvasive electric field (EF) sensors. METHODS Mice were implanted with electroencephalogram (EEG) and electromyogram (EMG) electrodes and housed individually. Noninvasive EF sensors were attached to the exterior of each chamber to record respiration and other movement simultaneously with EEG, EMG, and video. A sleep-wake scoring method based on EF sensor data was developed with reference to EEG/EMG and then validated by three expert scorers. Additionally, novice scorers without sleep-wake scoring experience were self-trained to score sleep using only the EF sensor data, and results were compared to those from expert scorers. Lastly, ability to capture three-state sleep-wake staging with EF sensors attached to traditional mouse home-cages was tested. RESULTS EF sensors quantified wake, rapid eye movement (REM) sleep, and non-REM sleep with high agreement (>93%) and comparable inter- and intra-scorer error as EEG/EMG. Novice scorers successfully learned sleep-wake scoring using only EF sensor data and scoring criteria, and achieved high agreement with expert scorers (>91%). When applied to traditional home-cages, EF sensors enabled classification of three-state (wake, NREM and REM) sleep-wake independent of EEG/EMG. CONCLUSIONS EF sensors score three-state sleep-wake architecture with high agreement to conventional EEG/EMG sleep-wake scoring 1) without invasive surgery, 2) from outside the home-cage, and 3) and without requiring specialized training or equipment. EF sensors provide an alternative method to assess rodent sleep for animal models and research laboratories in which EEG/EMG is not possible or where noninvasive approaches are preferred.
Collapse
|
8
|
Detecting Suspected Pump Thrombosis in Left Ventricular Assist Devices via Acoustic Analysis. IEEE J Biomed Health Inform 2020; 24:1899-1906. [PMID: 31940570 PMCID: PMC7380556 DOI: 10.1109/jbhi.2020.2966178] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
OBJECTIVE Left ventricular assist devices (LVADs) fail in up to 10% of patients due to the development of pump thrombosis. Remote monitoring of patients with LVADs can enable early detection and, subsequently, treatment and prevention of pump thrombosis. We assessed whether acoustical signals measured on the chest of patients with LVADs, combined with machine learning algorithms, can be used for detecting pump thrombosis. METHODS 13 centrifugal pump (HVAD) recipients were enrolled in the study. When hospitalized for suspected pump thrombosis, clinical data and acoustical recordings were obtained at admission, prior to and after administration of thrombolytic therapy, and every 24 hours until laboratory and pump parameters normalized. First, we selected the most important features among our feature set using LDH-based correlation analysis. Then using these features, we trained a logistic regression model and determined our decision threshold to differentiate between thrombosis and non-thrombosis episodes. RESULTS Accuracy, sensitivity and precision were calculated to be 88.9%, 90.9% and 83.3%, respectively. When tested on the post-thrombolysis data, our algorithm suggested possible pump abnormalities that were not identified by the reference pump power or biomarker abnormalities. SIGNIFICANCE We showed that the acoustical signatures of LVADs can be an index of mechanical deterioration and, when combined with machine learning algorithms, provide clinical decision support regarding the presence of pump thrombosis.
Collapse
|
9
|
SmokingOpp: Detecting the Smoking 'Opportunity' Context Using Mobile Sensors. PROCEEDINGS OF THE ACM ON INTERACTIVE, MOBILE, WEARABLE AND UBIQUITOUS TECHNOLOGIES 2020; 4. [PMID: 34651096 DOI: 10.1145/3380987] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Context plays a key role in impulsive adverse behaviors such as fights, suicide attempts, binge-drinking, and smoking lapse. Several contexts dissuade such behaviors, but some may trigger adverse impulsive behaviors. We define these latter contexts as 'opportunity' contexts, as their passive detection from sensors can be used to deliver context-sensitive interventions. In this paper, we define the general concept of 'opportunity' contexts and apply it to the case of smoking cessation. We operationalize the smoking 'opportunity' context, using self-reported smoking allowance and cigarette availability. We show its clinical utility by establishing its association with smoking occurrences using Granger causality. Next, we mine several informative features from GPS traces, including the novel location context of smoking spots, to develop the SmokingOpp model for automatically detecting the smoking 'opportunity' context. Finally, we train and evaluate the SmokingOpp model using 15 million GPS points and 3,432 self-reports from 90 newly abstinent smokers in a smoking cessation study.
Collapse
|
10
|
Classification of Decompensated Heart Failure From Clinical and Home Ballistocardiography. IEEE Trans Biomed Eng 2019; 67:1303-1313. [PMID: 31425011 DOI: 10.1109/tbme.2019.2935619] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
OBJECTIVE To improve home monitoring of heart failure patients so as to reduce emergency room visits and hospital readmissions. We aim to do this by analyzing the ballistocardiogram (BCG) to evaluate the clinical state of the patient. METHODS 1) High quality BCG signals were collected at home from HF patients after discharge. 2) The BCG recordings were preprocessed to exclude outliers and artifacts. 3) Parameters of the BCG that contain information about the cardiovascular system were extracted. These features were used for the task of classification of the BCG recording based on the status of HF. RESULTS The best AUC score for the task of classification obtained was 0.78 using slight variant of the leave one subject out validation method. CONCLUSION This work demonstrates that high quality BCG signals can be collected in a home environment and used to detect the clinical state of HF patients. SIGNIFICANCE In future work, a clinician/caregiver can be introduced into the system so that appropriate interventions can be performed based on the clinical state monitored at home.
Collapse
|
11
|
|
12
|
Dermoscopy diagnosis of cancerous lesions utilizing dual deep learning algorithms via visual and audio (sonification) outputs: Laboratory and prospective observational studies. EBioMedicine 2019; 40:176-183. [PMID: 30674442 PMCID: PMC6413349 DOI: 10.1016/j.ebiom.2019.01.028] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Revised: 12/23/2018] [Accepted: 01/11/2019] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Early diagnosis of skin cancer lesions by dermoscopy, the gold standard in dermatological imaging, calls for a diagnostic upscale. The aim of the study was to improve the accuracy of dermoscopic skin cancer diagnosis through use of novel deep learning (DL) algorithms. An additional sonification-derived diagnostic layer was added to the visual classification to increase sensitivity. METHODS Two parallel studies were conducted: a laboratory retrospective study (LABS, n = 482 biopsies) and a non-interventional prospective observational study (OBS, n = 63 biopsies). A training data set of biopsy-verified reports, normal and cancerous skin lesions (n = 3954), were used to develop a DL classifier exploring visual features (System A). The outputs of the classifier were sonified, i.e. data conversion into sound (System B). Derived sound files were analyzed by a second machine learning classifier, either as raw audio (LABS, OBS) or following conversion into spectrograms (LABS) and by image analysis and human heuristics (OBS). The OBS criteria outcomes were System A specificity and System B sensitivity as raw sounds, spectrogram areas or heuristics. FINDINGS LABS employed dermoscopies, half benign half malignant, and compared the accuracy of Systems A and B. System A algorithm resulted in a ROC AUC of 0.976 (95% CI, 0.965-0.987). Secondary machine learning analysis of raw sound, FFT and Spectrogram ROC curves resulted in AUC's of 0.931 (95% CI 0.881-0.981), 0.90 (95% CI 0.838-0.963) and 0.988 (CI 95% 0.973-1.001), respectively. OBS analysis of raw sound dermoscopies by the secondary machine learning resulted in a ROC AUC of 0.819 (95% CI, 0.7956 to 0.8406). OBS image analysis of AUC for spectrograms displayed a ROC AUC of 0.808 (CI 95% 0.6945 To 0.9208). By applying a heuristic analysis of Systems A and B a sensitivity of 86% and specificity of 91% were derived in the clinical study. INTERPRETATION Adding a second stage of processing, which includes a deep learning algorithm of sonification and heuristic inspection with machine learning, significantly improves diagnostic accuracy. A combined two-stage system is expected to assist clinical decisions and de-escalate the current trend of over-diagnosis of skin cancer lesions as pathological. FUND: Bostel Technologies. Trial Registration clinicaltrials.gov Identifier: NCT03362138.
Collapse
|
13
|
Information-Theoretic Model Predictive Control: Theory and Applications to Autonomous Driving. IEEE T ROBOT 2018. [DOI: 10.1109/tro.2018.2865891] [Citation(s) in RCA: 61] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
14
|
Representing Sudden Shifts in Intensive Dyadic Interaction Data Using Differential Equation Models with Regime Switching. PSYCHOMETRIKA 2018; 83:476-510. [PMID: 29557080 PMCID: PMC7370947 DOI: 10.1007/s11336-018-9605-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/23/2016] [Revised: 12/26/2017] [Indexed: 05/25/2023]
Abstract
A growing number of social scientists have turned to differential equations as a tool for capturing the dynamic interdependence among a system of variables. Current tools for fitting differential equation models do not provide a straightforward mechanism for diagnosing evidence for qualitative shifts in dynamics, nor do they provide ways of identifying the timing and possible determinants of such shifts. In this paper, we discuss regime-switching differential equation models, a novel modeling framework for representing abrupt changes in a system of differential equation models. Estimation was performed by combining the Kim filter (Kim and Nelson State-space models with regime switching: classical and Gibbs-sampling approaches with applications, MIT Press, Cambridge, 1999) and a numerical differential equation solver that can handle both ordinary and stochastic differential equations. The proposed approach was motivated by the need to represent discrete shifts in the movement dynamics of [Formula: see text] mother-infant dyads during the Strange Situation Procedure (SSP), a behavioral assessment where the infant is separated from and reunited with the mother twice. We illustrate the utility of a novel regime-switching differential equation model in representing children's tendency to exhibit shifts between the goal of staying close to their mothers and intermittent interest in moving away from their mothers to explore the room during the SSP. Results from empirical model fitting were supplemented with a Monte Carlo simulation study to evaluate the use of information criterion measures to diagnose sudden shifts in dynamics.
Collapse
|
15
|
Inferring Object Properties with a Tactile-Sensing Array Given Varying Joint Stiffness and Velocity. INT J HUM ROBOT 2018. [DOI: 10.1142/s0219843617500244] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Whole-arm tactile sensing enables a robot to sense contact and infer contact properties across its entire arm. Within this paper, we demonstrate that using data-driven methods, a humanoid robot can infer mechanical properties of objects from contact with its forearm during a simple reaching motion. A key issue is the extent to which the performance of data-driven methods can generalize to robot actions that differ from those used during training. To investigate this, we developed an idealized physics-based lumped element model of a robot with a compliant joint making contact with an object. Using this physics-based model, we performed experiments with varied robot, object and environment parameters. We also collected data from a tactile-sensing forearm on a real robot as it made contact with various objects during a simple reaching motion with varied arm velocities and joint stiffnesses. The robot used 1-nearest-neighbor (1-NN) classifiers, hidden Markov models (HMMs), and long short-term memory (LSTM) networks to infer two object properties (hard versus soft and moved versus unmoved) based on features of time-varying tactile sensor data (maximum force, contact area, and contact motion). We found that, in contrast to 1-NN, the performance of LSTMs (with sufficient data availability) and multivariate HMMs successfully generalized to new robot motions with distinct velocities and joint stiffnesses. Compared to single features, using multiple features gave the best results for both experiments with physics-based models and a real-robot.
Collapse
|
16
|
Abstract
Children with autism have atypical gaze behavior but it is unknown whether gaze differs during distinct types of reciprocal interactions. Typically developing children (N = 20) and children with autism (N = 20) (4-13 years) made similar amounts of eye contact with an examiner during a conversation. Surprisingly, there was minimal eye contact during interactive play in both groups. Gaze behavior was stable across 8 weeks in children with autism (N = 15). Lastly, gaze behavior during conversation but not play was associated with autism social affect severity scores (ADOS CSS SA) and the Social Responsiveness Scale (SRS-2). Together findings suggests that eye contact in typical and atypical development is influenced by subtle changes in context, which has implications for optimizing assessments of social communication skills.
Collapse
|
17
|
Parallel vision for perception and understanding of complex scenes: methods, framework, and perspectives. Artif Intell Rev 2017. [DOI: 10.1007/s10462-017-9569-z] [Citation(s) in RCA: 57] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
18
|
Center of Excellence for Mobile Sensor Data-to-Knowledge (MD2K). IEEE PERVASIVE COMPUTING 2017; 16:18-22. [PMID: 29276451 PMCID: PMC5739587 DOI: 10.1109/mprv.2017.29] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
|
19
|
Real-world visual statistics and infants' first-learned object names. Philos Trans R Soc Lond B Biol Sci 2017; 372:20160055. [PMID: 27872373 PMCID: PMC5124080 DOI: 10.1098/rstb.2016.0055] [Citation(s) in RCA: 92] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/29/2016] [Indexed: 11/12/2022] Open
Abstract
We offer a new solution to the unsolved problem of how infants break into word learning based on the visual statistics of everyday infant-perspective scenes. Images from head camera video captured by 8 1/2 to 10 1/2 month-old infants at 147 at-home mealtime events were analysed for the objects in view. The images were found to be highly cluttered with many different objects in view. However, the frequency distribution of object categories was extremely right skewed such that a very small set of objects was pervasively present-a fact that may substantially reduce the problem of referential ambiguity. The statistical structure of objects in these infant egocentric scenes differs markedly from that in the training sets used in computational models and in experiments on statistical word-referent learning. Therefore, the results also indicate a need to re-examine current explanations of how infants break into word learning.This article is part of the themed issue 'New frontiers for statistical learning in the cognitive sciences'.
Collapse
|
20
|
iSurvive: An Interpretable, Event-time Prediction Model for mHealth. PROCEEDINGS OF MACHINE LEARNING RESEARCH 2017; 70:970-979. [PMID: 30906932 PMCID: PMC6430609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
An important mobile health (mHealth) task is the use of multimodal data, such as sensor streams and self-report, to construct interpretable time-to-event predictions of, for example, lapse to alcohol or illicit drug use. Interpretability of the prediction model is important for acceptance and adoption by domain scientists, enabling model outputs and parameters to inform theory and guide intervention design. Temporal latent state models are therefore attractive, and so we adopt the continuous time hidden Markov model (CT-HMM) due to its ability to describe irregular arrival times of event data. Standard CT-HMMs, however, are not specialized for predicting the time to a future event, the key variable for mHealth interventions. Also, standard emission models lack a sufficiently rich structure to describe multimodal data and incorporate domain knowledge. We present iSurvive, an extension of classical survival analysis to a CT-HMM. We present a parameter learning method for GLM emissions and survival model fitting, and present promising results on both synthetic data and an mHealth drug use dataset.
Collapse
|
21
|
Abstract
Three-dimensional (3D) kinematic models are widely-used in video-based figure tracking. We show that these models can suffer from singularities when motion is directed along the viewing axis of a single camera. The single camera case is important because it arises in many interesting applications, such as motion capture from movie footage, video surveillance, and vision-based user-interfaces. We describe a novel two-dimensional scaled prismatic model (SPM) for figure registration. In contrast to 3D kinematic models, the SPM has fewer singularity problems and does not require detailed knowledge of the 3D kinematics. We fully characterize the singularities in the SPM and demonstrate tracking through singularities using synthetic and real examples. We demonstrate the application of our model to motion capture from movies. Fred Astaire is tracked in a clip from the film “Shall We Dance”. We also present the use of monocular hand tracking in a 3D user-interface. These results demonstrate the benefits of the SPM in tracking with a single source of video.
Collapse
|
22
|
Center of excellence for mobile sensor data-to-knowledge (MD2K). J Am Med Inform Assoc 2015; 22:1137-42. [PMID: 26555017 DOI: 10.1093/jamia/ocv056] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 04/27/2015] [Indexed: 11/13/2022] Open
Abstract
Mobile sensor data-to-knowledge (MD2K) was chosen as one of 11 Big Data Centers of Excellence by the National Institutes of Health, as part of its Big Data-to-Knowledge initiative. MD2K is developing innovative tools to streamline the collection, integration, management, visualization, analysis, and interpretation of health data generated by mobile and wearable sensors. The goal of the big data solutions being developed by MD2K is to reliably quantify physical, biological, behavioral, social, and environmental factors that contribute to health and disease risk. The research conducted by MD2K is targeted at improving health through early detection of adverse health events and by facilitating prevention. MD2K will make its tools, software, and training materials widely available and will also organize workshops and seminars to encourage their use by researchers and clinicians.
Collapse
|
23
|
Delving into Egocentric Actions. PROCEEDINGS. IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION 2015; 2015:287-295. [PMID: 26973427 DOI: 10.1109/cvpr.2015.7298625] [Citation(s) in RCA: 121] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
We address the challenging problem of recognizing the camera wearer's actions from videos captured by an egocentric camera. Egocentric videos encode a rich set of signals regarding the camera wearer, including head movement, hand pose and gaze information. We propose to utilize these mid-level egocentric cues for egocentric action recognition. We present a novel set of egocentric features and show how they can be combined with motion and object features. The result is a compact representation with superior performance. In addition, we provide the first systematic evaluation of motion, object and egocentric cues in egocentric action recognition. Our benchmark leads to several surprising findings. These findings uncover the best practices for egocentric actions, with a significant performance boost over all previous state-of-the-art methods on three publicly available datasets.
Collapse
|
24
|
Gaze-enabled Egocentric Video Summarization via Constrained Submodular Maximization. PROCEEDINGS. IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION 2015; 2015:2235-2244. [PMID: 26973428 PMCID: PMC4784707 DOI: 10.1109/cvpr.2015.7298836] [Citation(s) in RCA: 98] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
With the proliferation of wearable cameras, the number of videos of users documenting their personal lives using such devices is rapidly increasing. Since such videos may span hours, there is an important need for mechanisms that represent the information content in a compact form (i.e., shorter videos which are more easily browsable/sharable). Motivated by these applications, this paper focuses on the problem of egocentric video summarization. Such videos are usually continuous with significant camera shake and other quality issues. Because of these reasons, there is growing consensus that direct application of standard video summarization tools to such data yields unsatisfactory performance. In this paper, we demonstrate that using gaze tracking information (such as fixation and saccade) significantly helps the summarization task. It allows meaningful comparison of different image frames and enables deriving personalized summaries (gaze provides a sense of the camera wearer's intent). We formulate a summarization model which captures common-sense properties of a good summary, and show that it can be solved as a submodular function maximization with partition matroid constraints, opening the door to a rich body of work from combinatorial optimization. We evaluate our approach on a new gaze-enabled egocentric video dataset (over 15 hours), which will be a valuable standalone resource.
Collapse
|
25
|
Efficient Learning of Continuous-Time Hidden Markov Models for Disease Progression. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 2015; 28:3599-3607. [PMID: 27019571 PMCID: PMC4804157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The Continuous-Time Hidden Markov Model (CT-HMM) is an attractive approach to modeling disease progression due to its ability to describe noisy observations arriving irregularly in time. However, the lack of an efficient parameter learning algorithm for CT-HMM restricts its use to very small models or requires unrealistic constraints on the state transitions. In this paper, we present the first complete characterization of efficient EM-based learning methods for CT-HMM models. We demonstrate that the learning problem consists of two challenges: the estimation of posterior state probabilities and the computation of end-state conditioned statistics. We solve the first challenge by reformulating the estimation problem in terms of an equivalent discrete time-inhomogeneous hidden Markov model. The second challenge is addressed by adapting three approaches from the continuous time Markov chain literature to the CT-HMM domain. We demonstrate the use of CT-HMMs with more than 100 states to visualize and predict disease progression using a glaucoma dataset and an Alzheimer's disease dataset.
Collapse
|
26
|
Movement Pattern Histogram for Action Recognition and Retrieval. COMPUTER VISION – ECCV 2014 2014. [DOI: 10.1007/978-3-319-10605-2_45] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
27
|
Abstract
As a controllable medium, video-realistic crowds are important for creating the illusion of a populated reality in special effects, games, and architectural visualization. While recent progress in simulation and motion captured-based techniques for crowd synthesis has focused on natural macroscale behavior, this paper addresses the complementary problem of synthesizing crowds with realistic microscale behavior and appearance. Example-based synthesis methods such as video textures are an appealing alternative to conventional model-based methods, but current techniques are unable to represent and satisfy constraints between video sprites and the scene. This paper describes how to synthesize crowds by segmenting pedestrians from input videos of natural crowds and optimally placing them into an output video while satisfying environmental constraints imposed by the scene. We introduce crowd tubes, a representation of video objects designed to compose a crowd of video billboards while avoiding collisions between static and dynamic obstacles. The approach consists of representing crowd tube samples and constraint violations with a conflict graph. The maximal independent set yields a dense constraint-satisfying crowd composition. We present a prototype system for the capture, analysis, synthesis, and control of video-based crowds. Several results demonstrate the system's ability to generate videos of crowds which exhibit a variety of natural behaviors.
Collapse
|
28
|
C4: a real-time object detection framework. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2013; 22:4096-4107. [PMID: 23797259 DOI: 10.1109/tip.2013.2270111] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
A real-time and accurate object detection framework, C(4), is proposed in this paper. C(4) achieves 20 fps speed and the state-of-the-art detection accuracy, using only one processing thread without resorting to special hardware such as GPU. The real-time accurate object detection is made possible by two contributions. First, we conjecture (with supporting experiments) that contour is what we should capture and signs of comparisons among neighboring pixels are the key information to capture contour cues. Second, we show that the CENTRIST visual descriptor is suitable for contour based object detection, because it encodes the sign information and can implicitly represent the global contour. When CENTRIST and linear classifier are used, we propose a computational method that does not need to explicitly generate feature vectors. It involves no image preprocessing or feature vector normalization, and only requires O(1) steps to test an image patch. C(4) is also friendly to further hardware acceleration. It has been applied to detect objects such as pedestrians, faces, and cars on benchmark data sets. It has comparable detection accuracy with state-of-the-art methods, and has a clear advantage in detection speed.
Collapse
|
29
|
Longitudinal modeling of glaucoma progression using 2-dimensional continuous-time hidden Markov model. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION : MICCAI ... INTERNATIONAL CONFERENCE ON MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION 2013; 16:444-51. [PMID: 24579171 PMCID: PMC5988357 DOI: 10.1007/978-3-642-40763-5_55] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
We propose a 2D continuous-time Hidden Markov Model (2D CT-HMM) for glaucoma progression modeling given longitudinal structural and functional measurements. CT-HMM is suitable for modeling longitudinal medical data consisting of visits at arbitrary times, and 2D state structure is more appropriate for glaucoma since the time courses of functional and structural degeneration are usually different. The learned model not only corroborates the clinical findings that structural degeneration is more evident than functional degeneration in early glaucoma and the opposite is observed in more advanced stages, but also reveals the exact stages where the trend reverses. A method to detect time segments of fast progression is also proposed. Our results show that this detector can effectively identify patients with rapid degeneration. The model and the derived detector can be of clinical value for glaucoma monitoring.
Collapse
|
30
|
|
31
|
|
32
|
Computerized macular pathology diagnosis in spectral domain optical coherence tomography scans based on multiscale texture and shape features. Invest Ophthalmol Vis Sci 2011; 52:8316-22. [PMID: 21911579 DOI: 10.1167/iovs.10-7012] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
PURPOSE To develop an automated method to identify the normal macula and three macular pathologies (macular hole [MH], macular edema [ME], and age-related macular degeneration [AMD]) from the fovea-centered cross sections in three-dimensional (3D) spectral-domain optical coherence tomography (SD-OCT) images. METHODS A sample of SD-OCT macular scans (macular cube 200 × 200 or 512 × 128 scan protocol; Cirrus HD-OCT; Carl Zeiss Meditec, Inc., Dublin, CA) was obtained from healthy subjects and subjects with MH, ME, and/or AMD (dataset for development: 326 scans from 136 subjects [193 eyes], and dataset for testing: 131 scans from 37 subjects [58 eyes]). A fovea-centered cross-sectional slice for each of the SD-OCT images was encoded using spatially distributed multiscale texture and shape features. Three ophthalmologists labeled each fovea-centered slice independently, and the majority opinion for each pathology was used as the ground truth. Machine learning algorithms were used to identify the discriminative features automatically. Two-class support vector machine classifiers were trained to identify the presence of normal macula and each of the three pathologies separately. The area under the receiver operating characteristic curve (AUC) was calculated to assess the performance. RESULTS The cross-validation AUC result on the development dataset was 0.976, 0.931, 0939, and 0.938, and the AUC result on the holdout testing set was 0.978, 0.969, 0.941, and 0.975, for identifying normal macula, MH, ME, and AMD, respectively. CONCLUSIONS The proposed automated data-driven method successfully identified various macular pathologies (all AUC > 0.94). This method may effectively identify the discriminative features without relying on a potentially error-prone segmentation module.
Collapse
|
33
|
CENTRIST: A Visual Descriptor for Scene Categorization. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2011. [PMID: 21173449 DOI: 10.1109/tpami2010224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
CENsus TRansform hISTogram (CENTRIST), a new visual descriptor for recognizing topological places or scene categories, is introduced in this paper. We show that place and scene recognition, especially for indoor environments, require its visual descriptor to possess properties that are different from other vision domains (e.g., object recognition). CENTRIST satisfies these properties and suits the place and scene recognition task. It is a holistic representation and has strong generalizability for category recognition. CENTRIST mainly encodes the structural properties within an image and suppresses detailed textural information. Our experiments demonstrate that CENTRIST outperforms the current state of the art in several place and scene recognition data sets, compared with other descriptors such as SIFT and Gist. Besides, it is easy to implement and evaluates extremely fast.
Collapse
|
34
|
CENTRIST: A Visual Descriptor for Scene Categorization. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2011; 33:1489-501. [PMID: 21173449 DOI: 10.1109/tpami.2010.224] [Citation(s) in RCA: 82] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
CENsus TRansform hISTogram (CENTRIST), a new visual descriptor for recognizing topological places or scene categories, is introduced in this paper. We show that place and scene recognition, especially for indoor environments, require its visual descriptor to possess properties that are different from other vision domains (e.g., object recognition). CENTRIST satisfies these properties and suits the place and scene recognition task. It is a holistic representation and has strong generalizability for category recognition. CENTRIST mainly encodes the structural properties within an image and suppresses detailed textural information. Our experiments demonstrate that CENTRIST outperforms the current state of the art in several place and scene recognition data sets, compared with other descriptors such as SIFT and Gist. Besides, it is easy to implement and evaluates extremely fast.
Collapse
|
35
|
Automated macular pathology diagnosis in retinal OCT images using multi-scale spatial pyramid and local binary patterns in texture and shape encoding. Med Image Anal 2011; 15:748-59. [PMID: 21737338 DOI: 10.1016/j.media.2011.06.005] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2011] [Revised: 06/03/2011] [Accepted: 06/06/2011] [Indexed: 11/18/2022]
Abstract
We address a novel problem domain in the analysis of optical coherence tomography (OCT) images: the diagnosis of multiple macular pathologies in retinal OCT images. The goal is to identify the presence of normal macula and each of three types of macular pathologies, namely, macular edema, macular hole, and age-related macular degeneration, in the OCT slice centered at the fovea. We use a machine learning approach based on global image descriptors formed from a multi-scale spatial pyramid. Our local features are dimension-reduced local binary pattern histograms, which are capable of encoding texture and shape information in retinal OCT images and their edge maps, respectively. Our representation operates at multiple spatial scales and granularities, leading to robust performance. We use 2-class support vector machine classifiers to identify the presence of normal macula and each of the three pathologies. To further discriminate sub-types within a pathology, we also build a classifier to differentiate full-thickness holes from pseudo-holes within the macular hole category. We conduct extensive experiments on a large dataset of 326 OCT scans from 136 subjects. The results show that the proposed method is very effective (all AUC>0.93).
Collapse
|
36
|
Monitoring dressing activity failures through RFID and video. Methods Inf Med 2011; 51:45-54. [PMID: 21533305 DOI: 10.3414/me10-02-0026] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2010] [Accepted: 12/01/2010] [Indexed: 11/09/2022]
Abstract
BACKGROUND Monitoring and evaluation of Activities of Daily Living in general, and dressing activity in particular, is an important indicator in the evaluation of the overall cognitive state of patients. In addition, the effectiveness of therapy in patients with motor impairments caused by a stroke, for example, can be measured through long-term monitoring of dressing activity. However, automatic monitoring of dressing activity has not received significant attention in the current literature. OBJECTIVES Considering the importance of monitoring dressing activity, the main goal of this work was to investigate the possibility of recognizing dressing activities and automatically identifying common failures exhibited by patients suffering from motor or cognitive impairments. METHODS The system developed for this purpose comprised analysis of RFID (radio frequency identification) tracking and computer vision processing. Eleven test subjects, not connected to the research, were recruited and asked to perform the dressing task by choosing any combination of clothes without further assistance. Initially the test subjects performed correct dressing and then they were free to choose from a set of dressing failures identified from the current research literature. RESULTS The developed system was capable of automatically recognizing common dressing failures. In total, there were four dressing failures observed for upper garments and three failures for lower garments, in addition to recognizing successful dressing. The recognition rate for identified dressing failures was between 80% and 100%. CONCLUSIONS We developed a robust system to monitor the dressing activity. Given the importance of monitoring the dressing activity as an indicator of both cognitive and motor skills the system allows for the possibility of long term tracking and continuous evaluation of the dressing task. Long term monitoring can be used in rehabilitation and cognitive skills evaluation.
Collapse
|
37
|
Abstract
A fundamental requirement of any autonomous robot system is the ability to predict the affordances of its environment. The set of affordances define the actions that are available to the agent given the robot’s context. A standard approach to affordance learning is direct perception, which learns direct mappings from sensor measurements to affordance labels. For example, a robot designed for cross-country navigation could map stereo depth information and image features directly into predictions about the traversability of terrain regions. While this approach can succeed for a small number of affordances, it does not scale well as the number of affordances increases. In this paper, we show that visual object categories can be used as an intermediate representation that makes the affordance learning problem scalable. We develop a probabilistic graphical model which we call the Category—Affordance (CA) model, which describes the relationships between object categories, affordances, and appearance. This model casts visual object categorization as an intermediate inference step in affordance prediction. We describe several novel affordance learning and training strategies that are supported by our new model. Experimental results with indoor mobile robots evaluate these different strategies and demonstrate the advantages of the CA model in affordance learning, especially when learning from limited size data sets.
Collapse
|
38
|
Fast asymmetric learning for cascade face detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2008; 30:369-382. [PMID: 18195433 DOI: 10.1109/tpami.2007.1181] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
A cascade face detector uses a sequence of node classifiers to distinguish faces from non-faces. This paper presents a new approach to design node classifiers in the cascade detector. Previous methods used machine learning algorithms that simultaneously select features and form ensemble classifiers. We argue that if these two parts are decoupled, we have the freedom to design a classifier that explicitly addresses the difficulties caused by the asymmetric learning goal. There are three contributions in this paper. The first is a categorization of asymmetries in the learning goal, and why they make face detection hard. The second is the Forward Feature Selection (FFS) algorithm and a fast pre- omputing strategy for AdaBoost. FFS and the fast AdaBoost can reduce the training time by approximately 100 and 50 times, in comparison to a naive implementation of the AdaBoost feature selection method. The last contribution is Linear Asymmetric Classifier (LAC), a classifier that explicitly handles the asymmetric learning goal as a well-defined constrained optimization problem. We demonstrated experimentally that LAC results in improved ensemble classifier performance.
Collapse
|
39
|
Terrain synthesis from digital elevation models. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2007; 13:834-48. [PMID: 17495341 DOI: 10.1109/tvcg.2007.1027] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
In this paper, we present an example-based system for terrain synthesis. In our approach, patches from a sample terrain (represented by a height field) are used to generate a new terrain. The synthesis is guided by a user-sketched feature map that specifies where terrain features occur in the resulting synthetic terrain. Our system emphasizes large-scale curvilinear features (ridges and valleys) because such features are the dominant visual elements in most terrains. Both the example height field and user's sketch map are analyzed using a technique from the field of geomorphology. The system finds patches from the example data that match the features found in the user's sketch. Patches are joined together using graph cuts and Poisson editing. The order in which patches are placed in the synthesized terrain is determined by breadth-first traversal of a feature tree and this generates improved results over standard raster-scan placement orders. Our technique supports user-controlled terrain synthesis in a wide variety of styles, based upon the visual richness of real-world terrain data.
Collapse
|
40
|
Shadow elimination and blinding light suppression for interactive projected displays. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2007; 13:508-17. [PMID: 17356217 DOI: 10.1109/tvcg.2007.1007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
A major problem with interactive displays based on front projection is that users cast undesirable shadows on the display surface. This paper demonstrates that shadows can be muted by redundantly illuminating the display surface using multiple projectors, all mounted at different locations. However, this technique alone does not eliminate shadows: Multiple projectors create multiple dark regions on the surface (penumbral occlusions) and cast undesirable light onto the users. These problems can be solved by eliminating shadows and suppressing the light that falls on occluding users by actively modifying the projected output. This paper categorizes various methods that can be used to achieve redundant illumination, shadow elimination, and blinding light suppression and evaluates their performance.
Collapse
|
41
|
|
42
|
A Bayesian multiple-hypothesis approach to edge grouping and contour segmentation. Int J Comput Vis 1993. [DOI: 10.1007/bf01420590] [Citation(s) in RCA: 51] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|