1
|
McMullin MA, Kumar R, Higgins NC, Gygi B, Elhilali M, Snyder JS. Preliminary Evidence for Global Properties in Human Listeners During Natural Auditory Scene Perception. Open Mind (Camb) 2024; 8:333-365. [PMID: 38571530 PMCID: PMC10990578 DOI: 10.1162/opmi_a_00131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 02/10/2024] [Indexed: 04/05/2024] Open
Abstract
Theories of auditory and visual scene analysis suggest the perception of scenes relies on the identification and segregation of objects within it, resembling a detail-oriented processing style. However, a more global process may occur while analyzing scenes, which has been evidenced in the visual domain. It is our understanding that a similar line of research has not been explored in the auditory domain; therefore, we evaluated the contributions of high-level global and low-level acoustic information to auditory scene perception. An additional aim was to increase the field's ecological validity by using and making available a new collection of high-quality auditory scenes. Participants rated scenes on 8 global properties (e.g., open vs. enclosed) and an acoustic analysis evaluated which low-level features predicted the ratings. We submitted the acoustic measures and average ratings of the global properties to separate exploratory factor analyses (EFAs). The EFA of the acoustic measures revealed a seven-factor structure explaining 57% of the variance in the data, while the EFA of the global property measures revealed a two-factor structure explaining 64% of the variance in the data. Regression analyses revealed each global property was predicted by at least one acoustic variable (R2 = 0.33-0.87). These findings were extended using deep neural network models where we examined correlations between human ratings of global properties and deep embeddings of two computational models: an object-based model and a scene-based model. The results support that participants' ratings are more strongly explained by a global analysis of the scene setting, though the relationship between scene perception and auditory perception is multifaceted, with differing correlation patterns evident between the two models. Taken together, our results provide evidence for the ability to perceive auditory scenes from a global perspective. Some of the acoustic measures predicted ratings of global scene perception, suggesting representations of auditory objects may be transformed through many stages of processing in the ventral auditory stream, similar to what has been proposed in the ventral visual stream. These findings and the open availability of our scene collection will make future studies on perception, attention, and memory for natural auditory scenes possible.
Collapse
Affiliation(s)
| | - Rohit Kumar
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Nathan C. Higgins
- Department of Communication Sciences & Disorders, University of South Florida, Tampa, FL, USA
| | - Brian Gygi
- East Bay Institute for Research and Education, Martinez, CA, USA
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Joel S. Snyder
- Department of Psychology, University of Nevada, Las Vegas, Las Vegas, NV, USA
| |
Collapse
|
2
|
Gandolfo M, Abassi E, Balgova E, Downing PE, Papeo L, Koldewyn K. Converging evidence that left extrastriate body area supports visual sensitivity to social interactions. Curr Biol 2024; 34:343-351.e5. [PMID: 38181794 DOI: 10.1016/j.cub.2023.12.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 11/25/2023] [Accepted: 12/05/2023] [Indexed: 01/07/2024]
Abstract
Navigating our complex social world requires processing the interactions we observe. Recent psychophysical and neuroimaging studies provide parallel evidence that the human visual system may be attuned to efficiently perceive dyadic interactions. This work implies, but has not yet demonstrated, that activity in body-selective cortical regions causally supports efficient visual perception of interactions. We adopt a multi-method approach to close this important gap. First, using a large fMRI dataset (n = 92), we found that the left hemisphere extrastriate body area (EBA) responds more to face-to-face than non-facing dyads. Second, we replicated a behavioral marker of visual sensitivity to interactions: categorization of facing dyads is more impaired by inversion than non-facing dyads. Third, in a pre-registered experiment, we used fMRI-guided transcranial magnetic stimulation to show that online stimulation of the left EBA, but not a nearby control region, abolishes this selective inversion effect. Activity in left EBA, thus, causally supports the efficient perception of social interactions.
Collapse
Affiliation(s)
- Marco Gandolfo
- Donders Institute, Radboud University, Nijmegen 6525GD, the Netherlands; Department of Psychology, Bangor University, Bangor LL572AS, Gwynedd, UK.
| | - Etienne Abassi
- Institut des Sciences Cognitives, Marc Jeannerod, Lyon 69500, France
| | - Eva Balgova
- Department of Psychology, Bangor University, Bangor LL572AS, Gwynedd, UK; Department of Psychology, Aberystwyth University, Aberystwyth SY23 3UX, Ceredigion, UK
| | - Paul E Downing
- Department of Psychology, Bangor University, Bangor LL572AS, Gwynedd, UK
| | - Liuba Papeo
- Institut des Sciences Cognitives, Marc Jeannerod, Lyon 69500, France
| | - Kami Koldewyn
- Department of Psychology, Bangor University, Bangor LL572AS, Gwynedd, UK.
| |
Collapse
|
3
|
Peelen MV, Berlot E, de Lange FP. Predictive processing of scenes and objects. NATURE REVIEWS PSYCHOLOGY 2024; 3:13-26. [PMID: 38989004 PMCID: PMC7616164 DOI: 10.1038/s44159-023-00254-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 10/25/2023] [Indexed: 07/12/2024]
Abstract
Real-world visual input consists of rich scenes that are meaningfully composed of multiple objects which interact in complex, but predictable, ways. Despite this complexity, we recognize scenes, and objects within these scenes, from a brief glance at an image. In this review, we synthesize recent behavioral and neural findings that elucidate the mechanisms underlying this impressive ability. First, we review evidence that visual object and scene processing is partly implemented in parallel, allowing for a rapid initial gist of both objects and scenes concurrently. Next, we discuss recent evidence for bidirectional interactions between object and scene processing, with scene information modulating the visual processing of objects, and object information modulating the visual processing of scenes. Finally, we review evidence that objects also combine with each other to form object constellations, modulating the processing of individual objects within the object pathway. Altogether, these findings can be understood by conceptualizing object and scene perception as the outcome of a joint probabilistic inference, in which "best guesses" about objects act as priors for scene perception and vice versa, in order to concurrently optimize visual inference of objects and scenes.
Collapse
Affiliation(s)
- Marius V Peelen
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Eva Berlot
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Floris P de Lange
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| |
Collapse
|
4
|
Brandman T, Peelen MV. Objects sharpen visual scene representations: evidence from MEG decoding. Cereb Cortex 2023; 33:9524-9531. [PMID: 37365829 PMCID: PMC10431745 DOI: 10.1093/cercor/bhad222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 06/02/2023] [Accepted: 06/03/2023] [Indexed: 06/28/2023] Open
Abstract
Real-world scenes consist of objects, defined by local information, and scene background, defined by global information. Although objects and scenes are processed in separate pathways in visual cortex, their processing interacts. Specifically, previous studies have shown that scene context makes blurry objects look sharper, an effect that can be observed as a sharpening of object representations in visual cortex from around 300 ms after stimulus onset. Here, we use MEG to show that objects can also sharpen scene representations, with the same temporal profile. Photographs of indoor (closed) and outdoor (open) scenes were blurred such that they were difficult to categorize on their own but easily disambiguated by the inclusion of an object. Classifiers were trained to distinguish MEG response patterns to intact indoor and outdoor scenes, presented in an independent run, and tested on degraded scenes in the main experiment. Results revealed better decoding of scenes with objects than scenes alone and objects alone from 300 ms after stimulus onset. This effect was strongest over left posterior sensors. These findings show that the influence of objects on scene representations occurs at similar latencies as the influence of scenes on object representations, in line with a common predictive processing mechanism.
Collapse
Affiliation(s)
- Talia Brandman
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen 6525 GD, The Netherlands
| | - Marius V Peelen
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen 6525 GD, The Netherlands
| |
Collapse
|
5
|
Bracci S, Mraz J, Zeman A, Leys G, Op de Beeck H. The representational hierarchy in human and artificial visual systems in the presence of object-scene regularities. PLoS Comput Biol 2023; 19:e1011086. [PMID: 37115763 PMCID: PMC10171658 DOI: 10.1371/journal.pcbi.1011086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 05/10/2023] [Accepted: 04/09/2023] [Indexed: 04/29/2023] Open
Abstract
Human vision is still largely unexplained. Computer vision made impressive progress on this front, but it is still unclear to which extent artificial neural networks approximate human object vision at the behavioral and neural levels. Here, we investigated whether machine object vision mimics the representational hierarchy of human object vision with an experimental design that allows testing within-domain representations for animals and scenes, as well as across-domain representations reflecting their real-world contextual regularities such as animal-scene pairs that often co-occur in the visual environment. We found that DCNNs trained in object recognition acquire representations, in their late processing stage, that closely capture human conceptual judgements about the co-occurrence of animals and their typical scenes. Likewise, the DCNNs representational hierarchy shows surprising similarities with the representational transformations emerging in domain-specific ventrotemporal areas up to domain-general frontoparietal areas. Despite these remarkable similarities, the underlying information processing differs. The ability of neural networks to learn a human-like high-level conceptual representation of object-scene co-occurrence depends upon the amount of object-scene co-occurrence present in the image set thus highlighting the fundamental role of training history. Further, although mid/high-level DCNN layers represent the category division for animals and scenes as observed in VTC, its information content shows reduced domain-specific representational richness. To conclude, by testing within- and between-domain selectivity while manipulating contextual regularities we reveal unknown similarities and differences in the information processing strategies employed by human and artificial visual systems.
Collapse
Affiliation(s)
- Stefania Bracci
- Center for Mind/Brain Sciences-CIMeC, University of Trento, Rovereto, Italy
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Jakob Mraz
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Astrid Zeman
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Gaëlle Leys
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| | - Hans Op de Beeck
- KU Leuven, Leuven Brain Institute, Brain & Cognition Research Unit, Leuven, Belgium
| |
Collapse
|
6
|
Furtak M, Mudrik L, Bola M. The forest, the trees, or both? Hierarchy and interactions between gist and object processing during perception of real-world scenes. Cognition 2021; 221:104983. [PMID: 34968994 DOI: 10.1016/j.cognition.2021.104983] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 11/30/2021] [Accepted: 12/02/2021] [Indexed: 11/26/2022]
Abstract
The global-to-local theories of perception assume that the gist of a scene is computed early and automatically, whereas recognition of objects occurs at a later processing stage, requires attentional resources, and is primed by the representation of gist. To test these theoretical predictions, we investigated the processing hierarchy of gist- and object-recognition and their interaction in two experiments (total N = 60). Backward-masked images of real-world scenes were presented for a range of brief durations - between 8 ms and 100 ms, and participants performed either an object or a background classification task, in separate blocks. We report three main findings. First, scenes' backgrounds were generally classified more accurately than foreground objects, but recognition of objects was boosted to the same level as backgrounds by cueing spatial attention to the exact object's location. Second, backgrounds influence objects' recognition, as objects presented within semantically incongruent backgrounds were classified less accurately. Third, objects influence background categorization, as backgrounds comprising incongruent objects were also classified less accurately. Therefore, the first two findings support the global-to-local theories, implying that gists are indeed more readily perceived than objects, probably at an earlier stage. Yet the latter finding that objects also influence gist recognition suggests a more parallel and interactive view of both processes than previously assumed.
Collapse
Affiliation(s)
- Marcin Furtak
- Laboratory of Brain Imaging, Nencki Institute of Experimental Biology of Polish Academy of Sciences, 3 Pasteur Street, 02-093 Warsaw, Poland
| | - Liad Mudrik
- School of Psychological Science, Tel Aviv University, PO Box 39040, Tel Aviv, Israel; Sagol School of Neuroscience, Tel Aviv University, PO Box 39040, Tel Aviv, Israel
| | - Michał Bola
- Laboratory of Brain Imaging, Nencki Institute of Experimental Biology of Polish Academy of Sciences, 3 Pasteur Street, 02-093 Warsaw, Poland.
| |
Collapse
|
7
|
Wurm MF, Caramazza A. Two 'what' pathways for action and object recognition. Trends Cogn Sci 2021; 26:103-116. [PMID: 34702661 DOI: 10.1016/j.tics.2021.10.003] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 09/03/2021] [Accepted: 10/01/2021] [Indexed: 10/20/2022]
Abstract
The ventral visual stream is conceived as a pathway for object recognition. However, we also recognize the actions an object can be involved in. Here, we show that action recognition critically depends on a pathway in lateral occipitotemporal cortex, partially overlapping and topographically aligned with object representations that are precursors for action recognition. By contrast, object features that are more relevant for object recognition, such as color and texture, are typically found in ventral occipitotemporal cortex. We argue that occipitotemporal cortex contains similarly organized lateral and ventral 'what' pathways for action and object recognition, respectively. This account explains a number of observed phenomena, such as the duplication of object domains and the specific representational profiles in lateral and ventral cortex.
Collapse
Affiliation(s)
- Moritz F Wurm
- Center for Mind/Brain Sciences - CIMeC, University of Trento, Corso Bettini 31, 38068 Rovereto, Italy.
| | - Alfonso Caramazza
- Center for Mind/Brain Sciences - CIMeC, University of Trento, Corso Bettini 31, 38068 Rovereto, Italy; Department of Psychology, Harvard University, 33 Kirkland St, Cambridge, MA 02138, USA
| |
Collapse
|
8
|
Wischnewski M, Peelen MV. Causal neural mechanisms of context-based object recognition. eLife 2021; 10:69736. [PMID: 34374647 PMCID: PMC8354632 DOI: 10.7554/elife.69736] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2021] [Accepted: 07/26/2021] [Indexed: 12/05/2022] Open
Abstract
Objects can be recognized based on their intrinsic features, including shape, color, and texture. In daily life, however, such features are often not clearly visible, for example when objects appear in the periphery, in clutter, or at a distance. Interestingly, object recognition can still be highly accurate under these conditions when objects are seen within their typical scene context. What are the neural mechanisms of context-based object recognition? According to parallel processing accounts, context-based object recognition is supported by the parallel processing of object and scene information in separate pathways. Output of these pathways is then combined in downstream regions, leading to contextual benefits in object recognition. Alternatively, according to feedback accounts, context-based object recognition is supported by (direct or indirect) feedback from scene-selective to object-selective regions. Here, in three pre-registered transcranial magnetic stimulation (TMS) experiments, we tested a key prediction of the feedback hypothesis: that scene-selective cortex causally and selectively supports context-based object recognition before object-selective cortex does. Early visual cortex (EVC), object-selective lateral occipital cortex (LOC), and scene-selective occipital place area (OPA) were stimulated at three time points relative to stimulus onset while participants categorized degraded objects in scenes and intact objects in isolation, in different trials. Results confirmed our predictions: relative to isolated object recognition, context-based object recognition was selectively and causally supported by OPA at 160–200 ms after onset, followed by LOC at 260–300 ms after onset. These results indicate that context-based expectations facilitate object recognition by disambiguating object representations in the visual cortex.
Collapse
Affiliation(s)
- Miles Wischnewski
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands.,Department of Biomedical Engineering, University of Minnesota, Minneapolis, United States
| | - Marius V Peelen
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| |
Collapse
|