1
|
Enge K, Rind A, Iber M, Höldrich R, Aigner W. Towards a unified terminology for sonification and visualization. PERSONAL AND UBIQUITOUS COMPUTING 2023; 27:1949-1963. [PMID: 37869040 PMCID: PMC10589160 DOI: 10.1007/s00779-023-01720-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 03/19/2023] [Indexed: 10/24/2023]
Abstract
Both sonification and visualization convey information about data by effectively using our human perceptual system, but their ways to transform the data differ. Over the past 30 years, the sonification community has demanded a holistic perspective on data representation, including audio-visual analysis, several times. A design theory of audio-visual analysis would be a relevant step in this direction. An indispensable foundation for this endeavor is a terminology describing the combined design space. To build a bridge between the domains, we adopt three of the established theoretical constructs from visualization theory for the field of sonification. The three constructs are the spatial substrate, the visual mark, and the visual channel. In our model, we choose time to be the temporal substrate of sonification. Auditory marks are then positioned in time, such as visual marks are positioned in space. Auditory channels are encoded into auditory marks to convey information. The proposed definitions allow discussing visualization and sonification designs as well as multi-modal designs based on a common terminology. While the identified terminology can support audio-visual analytics research, it also provides a new perspective on sonification theory itself.
Collapse
Affiliation(s)
- Kajetan Enge
- Institute of Creative Media Technologies, FH St. Pölten, Campusplatz 1, St. Pölten, 3100 Austria
- Institute of Electronic Music and Acoustics, University of Music and Performing Arts Graz, Leonhardstraße 15, Graz, 8010 Austria
| | - Alexander Rind
- Institute of Creative Media Technologies, FH St. Pölten, Campusplatz 1, St. Pölten, 3100 Austria
| | - Michael Iber
- Institute of Creative Media Technologies, FH St. Pölten, Campusplatz 1, St. Pölten, 3100 Austria
| | - Robert Höldrich
- Institute of Electronic Music and Acoustics, University of Music and Performing Arts Graz, Leonhardstraße 15, Graz, 8010 Austria
| | - Wolfgang Aigner
- Institute of Creative Media Technologies, FH St. Pölten, Campusplatz 1, St. Pölten, 3100 Austria
| |
Collapse
|
2
|
Rajasingam SL, Summers RJ, Roberts B. Stream biasing by different induction sequences: Evaluating stream capture as an account of the segregation-promoting effects of constant-frequency inducers. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:3409. [PMID: 30599694 DOI: 10.1121/1.5082300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Accepted: 11/19/2018] [Indexed: 06/09/2023]
Abstract
Stream segregation for a test sequence comprising high-frequency (H) and low-frequency (L) pure tones, presented in a galloping rhythm, is much greater when preceded by a constant-frequency induction sequence matching one subset than by an inducer configured like the test sequence; this difference persists for several seconds. It has been proposed that constant-frequency inducers promote stream segregation by capturing the matching subset of test-sequence tones into an on-going, pre-established stream. This explanation was evaluated using 2-s induction sequences followed by longer test sequences (12-20 s). Listeners reported the number of streams heard throughout the test sequence. Experiment 1 used LHL- sequences and one or other subset of inducer tones was attenuated (0-24 dB in 6-dB steps, and ∞). Greater attenuation usually caused a progressive increase in segregation, towards that following the constant-frequency inducer. Experiment 2 used HLH- sequences and the L inducer tones were raised or lowered in frequency relative to their test-sequence counterparts (ΔfI = 0, 0.5, 1.0, or 1.5 × ΔfT ). Either change greatly increased segregation. These results are concordant with the notion of attention switching to new sounds but contradict the stream-capture hypothesis, unless a "proto-object" corresponding to the continuing subset is assumed to form during the induction sequence.
Collapse
Affiliation(s)
- Saima L Rajasingam
- Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, United Kingdom
| | - Robert J Summers
- Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, United Kingdom
| | - Brian Roberts
- Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, United Kingdom
| |
Collapse
|
3
|
Decomposing the Garner interference paradigm: evidence for dissociations between macrolevel and microlevel performance. Atten Percept Psychophys 2010; 72:1676-91. [PMID: 20675810 DOI: 10.3758/app.72.6.1676] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Three Garner interference experiments are described in which baseline, filtering, and correlated performance were assessed at both a macrolevel (condition average) and microlevel (intertrial contingency), using the pair-wise combinations of auditory pitch, loudness, and location. Discrepancies between pairs of dimensions were revealed between macro- and microlevel estimates of performance and, also, between filtering costs and correlated benefits, relative to baseline. The examination of the intertrial effects associated with filtering costs suggested that effects of increased stimulus uncertainty were mandatory, whereas effects of irrelevant variation were not. The examination of the intertrial effects associated with correlated benefits suggested that the detection of stimulus repetition took precedence over that of stimulus change. Violations of standard horse race accounts of processing did not appear to stem from differences in the absolute or relative speeds of processing between dimensions but, rather, from the special role that certain dimensions (e.g., pitch) may play in certain modalities (e.g., audition). The utility of examining repetition effects is demonstrated by revealing a level of understanding regarding stimulus processing typically hidden by aggregated measures of performance.
Collapse
|
4
|
Lee AKC, Deane-Pratt A, Shinn-Cunningham BG. Localization interference between components in an auditory scene. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 126:2543-55. [PMID: 19894834 PMCID: PMC2787073 DOI: 10.1121/1.3238240] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2008] [Revised: 04/24/2009] [Accepted: 08/31/2009] [Indexed: 05/25/2023]
Abstract
Some past studies suggest that when sound elements are heard as one object, the spatial cues in the component elements are integrated to determine perceived location, and that this integration is reduced when the elements are perceived in separate objects. The current study explored how object localization depends on the spatial, spectral, and temporal configurations of sound elements in an auditory scene. Localization results are interpreted in light of results from a series of previous experiments studying perceptual grouping of the same stimuli, e.g., Shinn-Cunningham et al. [Proc. Natl. Acad. Sci. U.S.A. 104, 12223-12227 (2007)]. The current results suggest that the integration (pulling) of spatial information across spectrally interleaved elements is obligatory when these elements are simultaneous, even though past results show that these simultaneous sound elements are not grouped strongly into a single perceptual object. In contrast, perceptually distinct objects repel (push) each other spatially with a strength that decreases as the temporal separation between competing objects increases. These results show that the perceived location of an attended object is not easily predicted by knowledge of how sound elements contribute to the perceived spectro-temporal content of that object.
Collapse
Affiliation(s)
- Adrian K C Lee
- Hearing Research Center, Boston University, Boston, MA 02215, USA
| | | | | |
Collapse
|
5
|
Lee AKC, Shinn-Cunningham BG. Effects of reverberant spatial cues on attention-dependent object formation. J Assoc Res Otolaryngol 2008; 9:150-60. [PMID: 18214613 DOI: 10.1007/s10162-007-0109-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2007] [Accepted: 11/28/2007] [Indexed: 11/25/2022] Open
Abstract
A recent study showed that when a sound mixture has ambiguous spectrotemporal structure, spatial cues alone are sufficient to change the balance of grouping cues and affect the perceptual organization of the auditory scene. The current study synthesizes similar stimuli in a reverberant setting to see whether the interaural decorrelation caused by reverberant energy reduces the influence of spatial cues on perceptual organization. Results suggest that reverberant spatial cues are less influential on perceptual segregation than anechoic spatial cues. In addition, results replicate an interesting finding from the earlier study, where an ambiguous tone that could logically belong to either a repeating tone sequence or a simultaneous harmonic complex can sometimes "disappear" and never be heard as part of the perceptual foreground, no matter which object a listener attends. As in the previous study, the perceived energy of the ambiguous element does not "trade" between the objects in a complex scene (i.e., the element does not necessarily contribute more to one object when it contributes less to a competing object). Results are consistent with the idea that the perceptual organization of an acoustic mixture depends on what object a listener attends.
Collapse
Affiliation(s)
- Adrian K C Lee
- Hearing Research Center, Boston University, Boston, MA 02215, USA
| | | |
Collapse
|
6
|
Turgeon M, Bregman AS, Roberts B. Rhythmic masking release: effects of asynchrony, temporal overlap, harmonic relations, and source separation on cross-spectral grouping. J Exp Psychol Hum Percept Perform 2006; 31:939-953. [PMID: 16262490 DOI: 10.1037/0096-1523.31.5.939] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The rhythm created by spacing a series of brief tones in a regular pattern can be disguised by interleaving identical distractors at irregular intervals. The disguised rhythm can be unmasked if the distractors are allocated to a separate stream from the rhythm by integration with temporally overlapping captors. Listeners identified which of 2 rhythms was presented, and the accuracy and rated clarity of their judgment was used to estimate the fusion of the distractors and captors. The extent of fusion depended primarily on onset asynchrony and degree of temporal overlap. Harmonic relations had some influence, but only an extreme difference in spatial location was effective (dichotic presentation). Both preattentive and attentionally driven processes governed performance.
Collapse
Affiliation(s)
- Martine Turgeon
- Sensory Motor Neuroscience Laboratory, School of Psychology, University of Birmingham
| | - Albert S Bregman
- Auditory Perception Laboratory, Department of Psychology, McGill University
| | - Brian Roberts
- Auditory Perception Laboratory, School of Psychology, University of Birmingham
| |
Collapse
|
7
|
Justus T, List A. Auditory attention to frequency and time: an analogy to visual local-global stimuli. Cognition 2005; 98:31-51. [PMID: 16297675 PMCID: PMC1987383 DOI: 10.1016/j.cognition.2004.11.001] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2004] [Revised: 07/27/2004] [Accepted: 11/11/2004] [Indexed: 10/26/2022]
Abstract
Two priming experiments demonstrated exogenous attentional persistence to the fundamental auditory dimensions of frequency (Experiment 1) and time (Experiment 2). In a divided-attention task, participants responded to an independent dimension, the identification of three-tone sequence patterns, for both prime and probe stimuli. The stimuli were specifically designed to parallel the local-global hierarchical letter stimuli of [Navon D. (1977). Forest before trees: The precedence of global features in visual perception. Cognitive Psychology, 9, 353-383] and the task was designed to parallel subsequent work in visual attention using Navon stimuli [Robertson, L. C. (1996). Attentional persistence for features of hierarchical patterns. Journal of Experimental Psychology: General, 125, 227-249; Ward, L. M. (1982). Determinants of attention to local and global features of visual forms. Journal of Experimental Psychology: Human Perception and Performance, 8, 562-581]. The results are discussed in terms of previous work in auditory attention and previous approaches to auditory local-global processing.
Collapse
|
8
|
Dyson BJ, Alain C, He Y. Effects of visual attentional load on low-level auditory scene analysis. COGNITIVE AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2005; 5:319-38. [PMID: 16396093 DOI: 10.3758/cabn.5.3.319] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The sharing of processing resources between the senses was investigated by examining the effects of visual task load on auditory event-related brain potentials (ERPs). In Experiment 1, participants completed both a zero-back and a one-back visual task while a tone pattern or a harmonic series was presented. N1 and P2 waves were modulated by visual task difficulty, but neither mismatch negativity (MMN) elicited by deviant stimuli from the tone pattern nor object-related negativity (ORN) elicited by mistuning from the harmonic series was affected. In Experiment 2, participants responded to identity (what) or location (where) in vision, while ignoring sounds alternating in either pitch (what) or location (where). Auditory ERP modulations were consistent with task difficulty, rather than with task specificity. In Experiment 3, we investigated auditory ERP generation under conditions of no visual task. The results are discussed with respect to a distinction between process-general (N1 and P2) and process-specific (MMN and ORN) auditory ERPs.
Collapse
Affiliation(s)
- Benjamin J Dyson
- Department of Psychology, University of Sussex, Falmer, Brighton, England.
| | | | | |
Collapse
|
9
|
Bedford FL. Analysis of a Constraint on Perception, Cognition, and Development: One Object, One Place, One Time. ACTA ACUST UNITED AC 2004; 30:907-12. [PMID: 15462628 DOI: 10.1037/0096-1523.30.5.907] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
It has become increasingly common for theories to rely on a constraint that 1 object cannot be in more than 1 place at the same time. Analysis suggests that a 1 object-1 place-1 time constraint as literally stated is false, that a modified constraint is biased toward the visual modality, that it may not be a correct description of the physical world, is not true of how objects must appear on sensory surfaces, and does not mean that 2 simultaneous spatially separated samples must be interpreted as 2 different objects, even for vision. However, once such object numerosity or identity is determined in some other way, then a modified constraint can be used to trigger learning, such as prism adaptation. A far-removed implication is that "Where is an object?" may be a misleading question.
Collapse
Affiliation(s)
- Felice L Bedford
- Department of Psychology, University of Arizona, Tucson 85721, USA.
| |
Collapse
|