1
|
Theuer JK, Koch NN, Gumbsch C, Elsner B, Butz MV. Infants infer and predict coherent event interactions: Modeling cognitive development. PLoS One 2024; 19:e0312532. [PMID: 39446862 PMCID: PMC11500850 DOI: 10.1371/journal.pone.0312532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Accepted: 10/09/2024] [Indexed: 10/26/2024] Open
Abstract
Mental representations of the environment in infants are sparse and grow richer during their development. Anticipatory eye fixation studies show that infants aged around 7 months start to predict the goal of an observed action, e.g., an object targeted by a reaching hand. Interestingly, goal-predictive gaze shifts occur at an earlier age when the hand subsequently manipulates an object and later when an action is performed by an inanimate actor, e.g., a mechanical claw. We introduce CAPRI2 (Cognitive Action PRediction and Inference in Infants), a computational model that explains this development from a functional, algorithmic perspective. It is based on the theory that infants learn object files and events as they develop a physical reasoning system. In particular, CAPRI2 learns a generative event-predictive model, which it uses to both interpret sensory information and infer goal-directed behavior. When observing object interactions, CAPRI2 (i) interprets the unfolding interactions in terms of event-segmented dynamics, (ii) maximizes the coherence of its event interpretations, updating its internal estimates and (iii) chooses gaze behavior to minimize expected uncertainty. As a result, CAPRI2 mimics the developmental pathway of infants' goal-predictive gaze behavior. Our modeling work suggests that the involved event-predictive representations, longer-term generative model learning, and shorter-term retrospective and active inference principles constitute fundamental building blocks for the effective development of goal-predictive capacities.
Collapse
Affiliation(s)
- Johanna K. Theuer
- Neuro-Cognitive Modeling, Department of Computer Science and Department of Psychology, University of Tübingen, Tübingen, Germany
| | - Nadine N. Koch
- Neuro-Cognitive Modeling, Department of Computer Science and Department of Psychology, University of Tübingen, Tübingen, Germany
| | - Christian Gumbsch
- Neuro-Cognitive Modeling, Department of Computer Science and Department of Psychology, University of Tübingen, Tübingen, Germany
- Chair of Cognitive and Clinical Neuroscience, Faculty of Psychology, Technical University Dresden, Dresden, Germany
| | - Birgit Elsner
- Developmental Psychology, Faculty of Humanities, University of Potsdam, Potsdam, Germany
| | - Martin V. Butz
- Neuro-Cognitive Modeling, Department of Computer Science and Department of Psychology, University of Tübingen, Tübingen, Germany
| |
Collapse
|
2
|
Ikuta H, Wöhler L, Aizawa K. Statistical characteristics of comic panel viewing times. Sci Rep 2023; 13:20291. [PMID: 37985682 PMCID: PMC10661992 DOI: 10.1038/s41598-023-47120-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 11/09/2023] [Indexed: 11/22/2023] Open
Abstract
Comics are a bimodal form of art involving a mixture of text and images. Since comics require a combination of various cognitive processes to comprehend their contents, the analysis of human comic reading behavior sheds light on how humans process such bimodal forms of media. In this paper, we particularly focus on the viewing times of each comic panel as a quantitative measure of attention, and analyze the statistical characteristics of the distributions of comic panel viewing times. We create a user interface that presents comics in a panel-wise manner, and measure the viewing times of each panel through a user study experiment. We collected data from 18 participants reading 7 comic book volumes resulting in over 99,000 viewing time data points, which will be released publicly. The results show that the average viewing times are proportional to the text length contained in the panel's speech bubbles, with a rate of proportion differing for each reader, despite the bimodal setting. Additionally, we find that the viewing time for all users follows a common heavy-tailed distribution.
Collapse
Affiliation(s)
- Hikaru Ikuta
- Department of Information and Communication Engineering, The University of Tokyo, Tokyo, 113-8656, Japan.
| | - Leslie Wöhler
- The University of Tokyo JSPS International Research Fellow, Tokyo, 113-8656, Japan
| | - Kiyoharu Aizawa
- Department of Information and Communication Engineering, The University of Tokyo, Tokyo, 113-8656, Japan
| |
Collapse
|
3
|
Klomberg B, Hacımusaoğlu I, Cohn N. Running through the Who, Where, and When: A Cross-cultural Analysis of Situational Changes in Comics. DISCOURSE PROCESSES 2022. [DOI: 10.1080/0163853x.2022.2106402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
Affiliation(s)
- Bien Klomberg
- Department of Communication and Cognition, Tilburg University, Tilburg School of Humanities and Digital Sciences
| | - Irmak Hacımusaoğlu
- Department of Communication and Cognition, Tilburg University, Tilburg School of Humanities and Digital Sciences
| | - Neil Cohn
- Department of Communication and Cognition, Tilburg University, Tilburg School of Humanities and Digital Sciences
| |
Collapse
|
4
|
Cohn N. A starring role for inference in the neurocognition of visual narratives. COGNITIVE RESEARCH-PRINCIPLES AND IMPLICATIONS 2021; 6:8. [PMID: 33587244 PMCID: PMC7884514 DOI: 10.1186/s41235-021-00270-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 01/06/2021] [Indexed: 11/10/2022]
Abstract
Research in verbal and visual narratives has often emphasized backward-looking inferences, where absent information is subsequently inferred. However, comics use conventions like star-shaped “action stars” where a reader knows events are undepicted at that moment, rather than omitted entirely. We contrasted the event-related brain potentials (ERPs) to visual narratives depicting an explicit event, an action star, or a “noise” panel of scrambled lines. Both action stars and noise panels evoked large N400s compared to explicit-events (300–500 ms), but action stars and noise panels then differed in their later effects (500–900 ms). Action stars elicited sustained negativities and P600s, which could indicate further interpretive processes and integration of meaning into a mental model, while noise panels evoked late frontal positivities possibly indexing that they were improbable narrative units. Nevertheless, panels following action stars and noise panels both evoked late sustained negativities, implying further inferential processing. Inference in visual narratives thus uses cascading mechanisms resembling those in language processing that differ based on the inferential techniques.
Collapse
Affiliation(s)
- Neil Cohn
- Department of Communication and Cognition, Tilburg School of Humanities and Digital Sciences, Tilburg University, P.O. Box 90153, 5000 LE, Tilburg, The Netherlands.
| |
Collapse
|
5
|
Magliano JP, Kurby CA, Ackerman T, Garlitch SM, Stewart JM. Lights, camera, action: the role of editing and framing on the processing of filmed events. JOURNAL OF COGNITIVE PSYCHOLOGY 2020. [DOI: 10.1080/20445911.2020.1796685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Joseph P. Magliano
- Department of Learning Sciences, Georgia State University, Atlanta, GA, USA
| | | | - Thomas Ackerman
- School of Filmmaking, The University of North Carolina School of the Arts, Winston-Salem, NC, USA
| | - Sydney M. Garlitch
- Department of Psychology, University of North Carolina at Greensboro, Greensboro, NC, USA
| | - J. Mac Stewart
- Department of Psychology, Grand Valley State University, Grand Rapids, MI, USA
| |
Collapse
|
6
|
Abstract
Visual narratives communicate event sequences by using different code systems such as pictures and texts. Thus, comprehenders must integrate information from different codalities. This study addressed such cross-codal integration processes by investigating how the codality of bridging-event information (i.e., pictures, text) affects the understanding of visual narrative events. In Experiment 1, bridging-event information was either present (as picture or text) or absent (i.e., not shown). The viewing times for the subsequent picture depicting the end state of the action were comparable within the absent and the text conditions. Further, the viewing times for the end-state picture were significantly longer in the text condition as compared to the pictorial condition. In Experiment 2, we tested whether replacing bridging-event information with a blank panel increases viewing times in a way similar to the text condition. Bridging event information was either present (as picture) or absent (not shown vs. blank panel). The results replicated Experiment 1. Additionally, the viewing times for the end-state pictures were longest in the blank condition. In Experiment 3, we investigated the costs related to integrating information from different codalities by directly comparing the text and picture conditions with the blank condition. The results showed that the distortion caused by the blank panel is larger than the distortion caused by cross-codal integration processes. Summarizing, we conclude that cross-codal information processing during narrative comprehension is possible but associated with additional mental effort. We discuss the results with regard to theories of narrative understanding.
Collapse
|
7
|
Cohn N, Magliano JP. Editors’ Introduction and Review: Visual Narrative Research: An Emerging Field in Cognitive Science. Top Cogn Sci 2019; 12:197-223. [PMID: 31865641 PMCID: PMC9328199 DOI: 10.1111/tops.12473] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Revised: 09/29/2019] [Accepted: 09/29/2019] [Indexed: 01/06/2023]
Abstract
Drawn sequences of images are among our oldest records of human intelligence, appearing on cave paintings, wall carvings, and ancient pottery, and they pervade across cultures from instruction manuals to comics. They also appear prevalently as stimuli across Cognitive Science, for studies of temporal cognition, event structure, social cognition, discourse, and basic intelligence. Yet, despite this fundamental place in human expression and research on cognition, the study of visual narratives themselves has only recently gained traction in Cognitive Science. This work has suggested that visual narrative comprehension requires cultural exposure across a developmental trajectory and engages with domain‐general processing mechanisms shared by visual perception, attention, event cognition, and language, among others. Here, we review the relevance of such research for the broader Cognitive Science community, and make the case for why researchers should join the scholarship of this ubiquitous but understudied aspect of human expression. Drawn sequences of images, like those in comics and picture stories, are a pervasive and fundamental way that humans have communicated for millennia. Yet, the study of visual narratives has only recently gained traction in Cognitive Science. Here we explore what has held back the study of the cognition of visual narratives, and why researchers should join in scholarship of this ubiquitous aspect of expression.
Collapse
Affiliation(s)
- Neil Cohn
- Department of Communciation and Cognition, Tilburg School of Humanities and Digital Sciences, Tilburg Center for Cognition and Communication, Tilburg Unviersity
| | - Joseph P. Magliano
- Department of Learning Sciences at the College of Education & Human Development, Georgia State University
| |
Collapse
|
8
|
Kopatich RD, Feller DP, Kurby CA, Magliano JP. The role of character goals and changes in body position in the processing of events in visual narratives. COGNITIVE RESEARCH-PRINCIPLES AND IMPLICATIONS 2019; 4:22. [PMID: 31286278 PMCID: PMC6614232 DOI: 10.1186/s41235-019-0176-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/04/2018] [Accepted: 05/31/2019] [Indexed: 11/10/2022]
Abstract
BACKGROUND A growing body of research is beginning to understand how people comprehend sequential visual narratives. However, previous work has used materials that primarily rely on visual information (i.e., they contain minimal language information). The current work seeks to address how visual and linguistic information streams are coordinated in sequential image comprehension. In experiment 1, participants viewed picture stories and engaged in an event segmentation task. The extent to which critical points in the narrative depicted situational continuity of character goals and continuity in bodily position was manipulated. The likelihood of perceiving an event boundary and viewing latencies at critical locations were measured. Experiment 1 was replicated in the second experiment, without the segmentation task. That is, participants read the picture stories without deciding where the event boundaries occurred. RESULTS Experiment 1 indicated that changes in character goals were associated with an increased likelihood of segmenting at the critical point, but changes in bodily position were not. A follow-up analysis, however, revealed that over the course of the entire story, changes in body position were a significant predictor of event segmentation. Viewing time, however, was affected by both goal and body position shifts. Experiment 2 corroborated the finding that viewing time was affected by changes in goals and body positions. CONCLUSION The current study shows that changes in body position influence a viewer's perception of event structure and event processing. This fits into a growing body of research that attempts to understand how consumers of multimodal media coordinate multiple information streams. The current study underscores the need for the systematic study of the visual, perceptual, and comprehension processes that occur during visual narrative understanding.
Collapse
|
9
|
Cohn N. Your Brain on Comics: A Cognitive Model of Visual Narrative Comprehension. Top Cogn Sci 2019; 12:352-386. [PMID: 30963724 PMCID: PMC9328425 DOI: 10.1111/tops.12421] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Revised: 01/21/2019] [Accepted: 03/18/2019] [Indexed: 11/30/2022]
Abstract
The past decade has seen a rapid growth of cognitive and brain research focused on visual narratives like comics and picture stories. This paper will summarize and integrate this emerging literature into the Parallel Interfacing Narrative‐Semantics Model (PINS Model)—a theory of sequential image processing characterized by an interaction between two representational levels: semantics and narrative structure. Ongoing semantic processes build meaning into an evolving mental model of a visual discourse. Updating of spatial, referential, and event information then incurs costs when they are discontinuous with the growing context. In parallel, a narrative structure organizes semantic information into coherent sequences by assigning images to categorical roles, which are then embedded within a hierarchic constituent structure. Narrative constructional schemas allow for specific predictions of structural sequencing, independent of semantics. Together, these interacting levels of representation engage in an iterative process of retrieval of semantic and narrative information, prediction of upcoming information based on those assessments, and subsequent updating based on discontinuity. These core mechanisms are argued to be domain‐general—spanning across expressive systems—as suggested by similar electrophysiological brain responses (N400, P600, anterior negativities) generated in response to manipulation of sequential images, music, and language. Such similarities between visual narratives and other domains thus pose fundamental questions for the linguistic and cognitive sciences. Visual narratives like comics involve a range of complex cognitive operations in order to be understood. The Parallel Interfacing Narrative‐Semantics (PINS) Model integrates an emerging literature showing that comprehension of wordless image sequences balances two representational levels of semantic and narrative structure. The neurocognitive mechanisms that guide these processes are argued to overlap with other domains, such as language and music.
Collapse
Affiliation(s)
- Neil Cohn
- Department of Communication and Cognition, Tilburg University
| |
Collapse
|
10
|
Attentional profiles linked to event segmentation are robust to missing information. COGNITIVE RESEARCH-PRINCIPLES AND IMPLICATIONS 2019; 4:8. [PMID: 30830507 PMCID: PMC6399362 DOI: 10.1186/s41235-019-0157-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/07/2018] [Accepted: 01/25/2019] [Indexed: 11/10/2022]
Abstract
Everyday experience consists of rapidly unfolding sensory information that humans redescribe as discrete events. Quick and efficient redescription facilitates remembering, responding to, and learning from the ongoing sensory flux. Segmentation seems key to successful redescription: the extent to which viewers can identify boundaries between event units within continuously unfolding activities predicts both memory and action performance. However, what happens to processing when boundary content is missing? Events occurring in naturalistic situations seldom receive continuous undivided attention. As a consequence, information, including boundary content, is likely sometimes missed. In this research, we systematically explored the influence of missing information by asking participants to advance at their own pace through a series of slideshows. Some slideshows, while otherwise matched in content, contained just half of the slides present in other slideshows. Missing content sometimes occurred at boundaries. As it turned out, patterns of attention during slideshow viewing were strikingly similar across matched slideshows despite missing content, even when missing content occurred at boundaries. Moreover, to the extent that viewers compensated with increased attention, missing content did not significantly undercut event recall. These findings seem to further confirm an information optimization account of event processing: event boundaries receive heightened attention because they forecast unpredictability and thus, optimize the uptake of new information. Missing boundary content sparks little change in patterns of attentional modulation, presumably because the underlying predictability parameters of the unfolding activity itself are unchanged by missing content. Optimizing information, thus, enables event processing and recall to be impressively resilient to missing content.
Collapse
|
11
|
Papenmeier F, Brockhoff A, Huff M. Filling the gap despite full attention: the role of fast backward inferences for event completion. COGNITIVE RESEARCH-PRINCIPLES AND IMPLICATIONS 2019; 4:3. [PMID: 30693396 PMCID: PMC6352563 DOI: 10.1186/s41235-018-0151-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/09/2018] [Accepted: 12/18/2018] [Indexed: 11/30/2022]
Abstract
The comprehension of dynamic naturalistic events poses at least two challenges to the cognitive system: filtering relevant information with attention and dealing with information that was missing or missed. With four experiments, we studied the completion of missing information despite full attention. Participants watched short soccer video clips and we informed participants that we removed a critical moment of ball contact in half of the clips. We asked participants to detect whether these moments of ball contact were present or absent. In Experiment 1, participants gave their detection responses either directly during an event or delayed after an event. Although participants directed their full attention toward the critical contact moment, they were more likely to indicate seeing the missing ball contact if it was followed by a causally matching scene than if it was followed by an unrelated scene, both for the immediate and delayed responses. Thus, event completion occurs quickly. In Experiment 2, only a causally matching scene but neither a white mask nor an irrelevant scene caused the completion of missing information. This indicates that the completion of missing information is caused by backward inferences rather than predictive perception. In Experiment 3, we showed that event completion occurs directly during a trial and does not depend on expectations built up after seeing the same causality condition multiple times. In Experiment 4, we linked our findings to event cognition by asking participants to perform a natural segmentation task. We conclude that observers complete missing information during coherent events based on a fast backward inference mechanism even when directing their attention toward the missing information.
Collapse
Affiliation(s)
- Frank Papenmeier
- Department of Psychology, University of Tübingen, Schleichstr. 4, 72076, Tübingen, Germany.
| | - Alisa Brockhoff
- Department of Psychology, University of Tübingen, Schleichstr. 4, 72076, Tübingen, Germany
| | - Markus Huff
- Department of Research Infrastructures, German Institute for Adult Education, Heinemannstraße 12-14, 53175, Bonn, Germany
| |
Collapse
|
12
|
Cohn N. Visual narratives and the mind: Comprehension, cognition, and learning. PSYCHOLOGY OF LEARNING AND MOTIVATION 2019. [DOI: 10.1016/bs.plm.2019.02.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
13
|
Hutson JP, Magliano JP, Loschky LC. Understanding Moment-to-Moment Processing of Visual Narratives. Cogn Sci 2018; 42:2999-3033. [PMID: 30447018 PMCID: PMC6587724 DOI: 10.1111/cogs.12699] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2017] [Revised: 10/09/2018] [Accepted: 10/10/2018] [Indexed: 11/29/2022]
Abstract
What role do moment‐to‐moment comprehension processes play in visual attentional selection in picture stories? The current work uniquely tested the role of bridging inference generation processes on eye movements while participants viewed picture stories. Specific components of the Scene Perception and Event Comprehension Theory (SPECT) were tested. Bridging inference generation was induced by manipulating the presence of highly inferable actions embedded in picture stories. When inferable actions are missing, participants have increased viewing times for the immediately following critical image (Magliano, Larson, Higgs, & Loschky, 2016). This study used eye‐tracking to test competing hypotheses about the increased viewing time: (a) Computational Load: inference generation processes increase overall computational load, producing longer fixation durations; (b) Visual Search: inference generation processes guide eye‐movements to pick up inference‐relevant information, producing more fixations. Participants had similar fixation durations, but they made more fixations while generating inferences, with that process starting from the fifth fixation. A follow‐up hypothesis predicted that when generating inferences, participants fixate scene regions important for generating the inference. A separate group of participants rated the inferential‐relevance of regions in the critical images, and results showed that these inferentially relevant regions predicted differences in other viewers’ eye movements. Thus, viewers’ event models in working memory affect visual attentional selection while viewing visual narratives.
Collapse
|
14
|
|