1
|
Lindeberg T. A time-causal and time-recursive scale-covariant scale-space representation of temporal signals and past time. BIOLOGICAL CYBERNETICS 2023; 117:21-59. [PMID: 36689001 PMCID: PMC10160219 DOI: 10.1007/s00422-022-00953-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Accepted: 11/21/2022] [Indexed: 05/05/2023]
Abstract
This article presents an overview of a theory for performing temporal smoothing on temporal signals in such a way that: (i) temporally smoothed signals at coarser temporal scales are guaranteed to constitute simplifications of corresponding temporally smoothed signals at any finer temporal scale (including the original signal) and (ii) the temporal smoothing process is both time-causal and time-recursive, in the sense that it does not require access to future information and can be performed with no other temporal memory buffer of the past than the resulting smoothed temporal scale-space representations themselves. For specific subsets of parameter settings for the classes of linear and shift-invariant temporal smoothing operators that obey this property, it is shown how temporal scale covariance can be additionally obtained, guaranteeing that if the temporal input signal is rescaled by a uniform temporal scaling factor, then also the resulting temporal scale-space representations of the rescaled temporal signal will constitute mere rescalings of the temporal scale-space representations of the original input signal, complemented by a shift along the temporal scale dimension. The resulting time-causal limit kernel that obeys this property constitutes a canonical temporal kernel for processing temporal signals in real-time scenarios when the regular Gaussian kernel cannot be used, because of its non-causal access to information from the future, and we cannot additionally require the temporal smoothing process to comprise a complementary memory of the past beyond the information contained in the temporal smoothing process itself, which in this way also serves as a multi-scale temporal memory of the past. We describe how the time-causal limit kernel relates to previously used temporal models, such as Koenderink's scale-time kernels and the ex-Gaussian kernel. We do also give an overview of how the time-causal limit kernel can be used for modelling the temporal processing in models for spatio-temporal and spectro-temporal receptive fields, and how it more generally has a high potential for modelling neural temporal response functions in a purely time-causal and time-recursive way, that can also handle phenomena at multiple temporal scales in a theoretically well-founded manner. We detail how this theory can be efficiently implemented for discrete data, in terms of a set of recursive filters coupled in cascade. Hence, the theory is generally applicable for both: (i) modelling continuous temporal phenomena over multiple temporal scales and (ii) digital processing of measured temporal signals in real time. We conclude by stating implications of the theory for modelling temporal phenomena in biological, perceptual, neural and memory processes by mathematical models, as well as implications regarding the philosophy of time and perceptual agents. Specifically, we propose that for A-type theories of time, as well as for perceptual agents, the notion of a non-infinitesimal inner temporal scale of the temporal receptive fields has to be included in representations of the present, where the inherent nonzero temporal delay of such time-causal receptive fields implies a need for incorporating predictions from the actual time-delayed present in the layers of a perceptual hierarchy, to make it possible for a representation of the perceptual present to constitute a representation of the environment with timing properties closer to the actual present.
Collapse
Affiliation(s)
- Tony Lindeberg
- Computational Brain Science Lab, Division of Computational Science and Technology, KTH Royal Institute of Technology, 100 44, Stockholm, Sweden.
| |
Collapse
|
2
|
Abstract
Perception adapts to the properties of prior stimulation, as illustrated by phenomena such as visual color constancy or speech context effects. In the auditory domain, only little is known about adaptive processes when it comes to the attribute of auditory brightness. Here, we report an experiment that tests whether listeners adapt to spectral colorations imposed on naturalistic music and speech excerpts. Our results indicate consistent contrastive adaptation of auditory brightness judgments on a trial-by-trial basis. The pattern of results suggests that these effects tend to grow with an increase in the duration of the adaptor context but level off after around 8 trials of 2 s duration. A simple model of the response criterion yields a correlation of r = .97 with the measured data and corroborates the notion that brightness perception adapts on timescales that fall in the range of auditory short-term memory. Effects turn out to be similar for spectral filtering based on linear spectral filter slopes and filtering based on a measured transfer function from a commercially available hearing device. Overall, our findings demonstrate the adaptivity of auditory brightness perception under realistic acoustical conditions.
Collapse
Affiliation(s)
- Kai Siedenburg
- Department of Medical Physics and Acoustics, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany.
| | - Feline Malin Barg
- Department of Medical Physics and Acoustics, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| | - Henning Schepker
- Department of Medical Physics and Acoustics, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
- Starkey Hearing, Eden Prairie, MN, USA
| |
Collapse
|
3
|
Lindeberg T. Normative theory of visual receptive fields. Heliyon 2021; 7:e05897. [PMID: 33521348 PMCID: PMC7820928 DOI: 10.1016/j.heliyon.2021.e05897] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 12/28/2020] [Accepted: 12/31/2020] [Indexed: 11/19/2022] Open
Abstract
This article gives an overview of a normative theory of visual receptive fields. We describe how idealized functional models of early spatial, spatio-chromatic and spatio-temporal receptive fields can be derived in a principled way, based on a set of axioms that reflect structural properties of the environment in combination with assumptions about the internal structure of a vision system to guarantee consistent handling of image representations over multiple spatial and temporal scales. Interestingly, this theory leads to predictions about visual receptive field shapes with qualitatively very good similarities to biological receptive fields measured in the retina, the LGN and the primary visual cortex (V1) of mammals.
Collapse
Affiliation(s)
- Tony Lindeberg
- Computational Brain Science Lab, Division of Computational Science and Technology, KTH Royal Institute of Technology, SE-100 44 Stockholm, Sweden
| |
Collapse
|
4
|
Abstract
We propose a complexity measure for black-and-white (B/W) digital images, based on the detection of typical length scales in the depicted motifs. Complexity is associated with diversity in those length scales. In this sense, the proposed measure penalizes images where typical scales are limited to small lengths, of a few pixels –as in an image where gray levels are distributed at random– or to lengths similar to the image size –as when gray levels are ordered into a simple, broad pattern. We introduce a complexity index which captures the structural richness of images with a wide range of typical scales, and compare several images with each other on the basis of this index. Since the index provides an objective quantification of image complexity, it could be used as the counterpart of subjective visual complexity in experimental perception research. As an application of the complexity index, we build a “complexity map” for South-American topography, by analyzing a large B/W image that represents terrain elevation data in the continent. Results show that the complexity index is able to clearly reveal regions with intricate topographical features such as river drainage networks and fjord-like coasts. Although, for the sake of concreteness, our complexity measure is introduced for B/W images, the definition can be straightforwardly extended to any object that admits a mathematical representation as a function of one or more variables. Thus, the quantification of structural richness can be adapted to time signals and distributions of various kinds.
Collapse
|
5
|
Friberg A, Lindeberg T, Hellwagner M, Helgason P, Salomão GL, Elowsson A, Lemaitre G, Ternström S. Prediction of three articulatory categories in vocal sound imitations using models for auditory receptive fields. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:1467. [PMID: 30424637 DOI: 10.1121/1.5052438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2018] [Accepted: 08/16/2018] [Indexed: 06/09/2023]
Abstract
Vocal sound imitations provide a new challenge for understanding the coupling between articulatory mechanisms and the resulting audio. In this study, the classification of three articulatory categories, phonation, supraglottal myoelastic vibrations, and turbulence, have been modeled from audio recordings. Two data sets were assembled, consisting of different vocal imitations by four professional imitators and four non-professional speakers in two different experiments. The audio data were manually annotated by two experienced phoneticians using a detailed articulatory description scheme. A separate set of audio features was developed specifically for each category using both time-domain and spectral methods. For all time-frequency transformations, and for some secondary processing, the recently developed Auditory Receptive Fields Toolbox was used. Three different machine learning methods were applied for predicting the final articulatory categories. The result with the best generalization was found using an ensemble of multilayer perceptrons. The cross-validated classification accuracy was 96.8% for phonation, 90.8% for supraglottal myoelastic vibrations, and 89.0% for turbulence using all the 84 developed features. A final feature reduction to 22 features yielded similar results.
Collapse
Affiliation(s)
- Anders Friberg
- Speech, Music and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Lindstedtsvägen 24, 10044 Stockholm, Sweden
| | - Tony Lindeberg
- Computational Brain Science Lab, Computational Science and Technology, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Lindstedtsvägen 5, 10044 Stockholm, Sweden
| | - Martin Hellwagner
- Speech, Music and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Lindstedtsvägen 24, 10044 Stockholm, Sweden
| | - Pétur Helgason
- Speech, Music and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Lindstedtsvägen 24, 10044 Stockholm, Sweden
| | - Gláucia Laís Salomão
- Speech, Music and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Lindstedtsvägen 24, 10044 Stockholm, Sweden
| | - Anders Elowsson
- Speech, Music and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Lindstedtsvägen 24, 10044 Stockholm, Sweden
| | - Guillaume Lemaitre
- Institute for Research and Coordination in Acoustics and Music, 1 Place Igor Stravinsky, Paris 75004, France
| | - Sten Ternström
- Speech, Music and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Lindstedtsvägen 24, 10044 Stockholm, Sweden
| |
Collapse
|
6
|
Bach JH, Kollmeier B, Anemüller J. Matching Pursuit Analysis of Auditory Receptive Fields' Spectro-Temporal Properties. Front Syst Neurosci 2017; 11:4. [PMID: 28232791 PMCID: PMC5299023 DOI: 10.3389/fnsys.2017.00004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2016] [Accepted: 01/23/2017] [Indexed: 11/13/2022] Open
Abstract
Gabor filters have long been proposed as models for spectro-temporal receptive fields (STRFs), with their specific spectral and temporal rate of modulation qualitatively replicating characteristics of STRF filters estimated from responses to auditory stimuli in physiological data. The present study builds on the Gabor-STRF model by proposing a methodology to quantitatively decompose STRFs into a set of optimally matched Gabor filters through matching pursuit, and by quantitatively evaluating spectral and temporal characteristics of STRFs in terms of the derived optimal Gabor-parameters. To summarize a neuron's spectro-temporal characteristics, we introduce a measure for the “diagonality,” i.e., the extent to which an STRF exhibits spectro-temporal transients which cannot be factorized into a product of a spectral and a temporal modulation. With this methodology, it is shown that approximately half of 52 analyzed zebra finch STRFs can each be well approximated by a single Gabor or a linear combination of two Gabor filters. Moreover, the dominant Gabor functions tend to be oriented either in the spectral or in the temporal direction, with truly “diagonal” Gabor functions rarely being necessary for reconstruction of an STRF's main characteristics. As a toy example for the applicability of STRF and Gabor-STRF filters to auditory detection tasks, we use STRF filters as features in an automatic event detection task and compare them to idealized Gabor filters and mel-frequency cepstral coefficients (MFCCs). STRFs classify a set of six everyday sounds with an accuracy similar to reference Gabor features (94% recognition rate). Spectro-temporal STRF and Gabor features outperform reference spectral MFCCs in quiet and in low noise conditions (down to 0 dB signal to noise ratio).
Collapse
Affiliation(s)
- Jörg-Hendrik Bach
- Medizinische Physik, Universität OldenburgOldenburg, Germany
- Cluster of Excellence Hearing4all, Universität OldenburgOldenburg, Germany
| | - Birger Kollmeier
- Medizinische Physik, Universität OldenburgOldenburg, Germany
- Cluster of Excellence Hearing4all, Universität OldenburgOldenburg, Germany
| | - Jörn Anemüller
- Medizinische Physik, Universität OldenburgOldenburg, Germany
- Cluster of Excellence Hearing4all, Universität OldenburgOldenburg, Germany
- *Correspondence: Jörn Anemüller
| |
Collapse
|
7
|
A Review of Physical and Perceptual Feature Extraction Techniques for Speech, Music and Environmental Sounds. APPLIED SCIENCES-BASEL 2016. [DOI: 10.3390/app6050143] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
8
|
Lindeberg T, Friberg A. Idealized computational models for auditory receptive fields. PLoS One 2015; 10:e0119032. [PMID: 25822973 PMCID: PMC4379182 DOI: 10.1371/journal.pone.0119032] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2014] [Accepted: 01/24/2015] [Indexed: 11/19/2022] Open
Abstract
We present a theory by which idealized models of auditory receptive fields can be derived in a principled axiomatic manner, from a set of structural properties to (i) enable invariance of receptive field responses under natural sound transformations and (ii) ensure internal consistency between spectro-temporal receptive fields at different temporal and spectral scales. For defining a time-frequency transformation of a purely temporal sound signal, it is shown that the framework allows for a new way of deriving the Gabor and Gammatone filters as well as a novel family of generalized Gammatone filters, with additional degrees of freedom to obtain different trade-offs between the spectral selectivity and the temporal delay of time-causal temporal window functions. When applied to the definition of a second-layer of receptive fields from a spectrogram, it is shown that the framework leads to two canonical families of spectro-temporal receptive fields, in terms of spectro-temporal derivatives of either spectro-temporal Gaussian kernels for non-causal time or a cascade of time-causal first-order integrators over the temporal domain and a Gaussian filter over the logspectral domain. For each filter family, the spectro-temporal receptive fields can be either separable over the time-frequency domain or be adapted to local glissando transformations that represent variations in logarithmic frequencies over time. Within each domain of either non-causal or time-causal time, these receptive field families are derived by uniqueness from the assumptions. It is demonstrated how the presented framework allows for computation of basic auditory features for audio processing and that it leads to predictions about auditory receptive fields with good qualitative similarity to biological receptive fields measured in the inferior colliculus (ICC) and primary auditory cortex (A1) of mammals.
Collapse
Affiliation(s)
- Tony Lindeberg
- Department of Computational Biology, School of Computer Science and Communication, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Anders Friberg
- Department of Speech, Music and Hearing, School of Computer Science and Communication, KTH Royal Institute of Technology, Stockholm, Sweden
| |
Collapse
|