1
|
Phillips MC, Myers EB. Auditory Processing of Speech and Nonspeech in People Who Stutter. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024; 67:2533-2547. [PMID: 39058919 DOI: 10.1044/2024_jslhr-24-00107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/28/2024]
Abstract
PURPOSE We investigated speech and nonspeech auditory processing of temporal and spectral cues in people who do and do not stutter. We also asked whether self-reported stuttering severity was predicted by performance on the auditory processing measures. METHOD People who stutter (n = 23) and people who do not stutter (n = 28) completed a series of four auditory processing tasks online. These tasks consisted of speech and nonspeech stimuli differing in spectral or temporal cues. We then used independent-samples t-tests to assess differences in phonetic categorization slopes between groups and linear mixed-effects models to test differences in nonspeech auditory processing between stuttering and nonstuttering groups, and stuttering severity as a function of performance on all auditory processing tasks. RESULTS We found statistically significant differences between people who do and do not stutter in phonetic categorization of a continuum differing in a temporal cue and in discrimination of nonspeech stimuli differing in a spectral cue. A significant proportion of variance in self-reported stuttering severity was predicted by performance on the auditory processing measures. CONCLUSIONS Taken together, these results suggest that people who stutter process both speech and nonspeech auditory information differently than people who do not stutter and may point to subtle differences in auditory processing that could contribute to stuttering. We also note that these patterns could be the consequence of listening to one's own speech, rather than the cause of production differences.
Collapse
Affiliation(s)
- Matthew C Phillips
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, Storrs
| | - Emily B Myers
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, Storrs
- Department of Psychological Sciences, University of Connecticut, Storrs
| |
Collapse
|
2
|
Nozari N, Martin RC. Is working memory domain-general or domain-specific? Trends Cogn Sci 2024:S1364-6613(24)00164-5. [PMID: 39019705 DOI: 10.1016/j.tics.2024.06.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 06/17/2024] [Accepted: 06/18/2024] [Indexed: 07/19/2024]
Abstract
Given the fundamental role of working memory (WM) in all domains of cognition, a central question has been whether WM is domain-general. However, the term 'domain-general' has been used in different, and sometimes misleading, ways. By reviewing recent evidence and biologically plausible models of WM, we show that the level of domain-generality varies substantially between three facets of WM: in terms of computations, WM is largely domain-general. In terms of neural correlates, it contains both domain-general and domain-specific elements. Finally, in terms of application, it is mostly domain-specific. This variance encourages a shift of focus towards uncovering domain-general computational principles and away from domain-general approaches to the analysis of individual differences and WM training, favoring newer perspectives, such as training-as-skill-learning.
Collapse
Affiliation(s)
- Nazbanou Nozari
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN, USA; Cognitive Science Program, Indiana University, Bloomington, IN, USA.
| | - Randi C Martin
- Department of Psychological Sciences, Rice University, Houston, TX, USA
| |
Collapse
|
3
|
Sorensen E, Oleson J, Kutlu E, McMurray B. A Bayesian hierarchical model for the analysis of visual analogue scaling tasks. Stat Methods Med Res 2024; 33:953-965. [PMID: 38573790 DOI: 10.1177/09622802241242319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/06/2024]
Abstract
In psychophysics and psychometrics, an integral method to the discipline involves charting how a person's response pattern changes according to a continuum of stimuli. For instance, in hearing science, Visual Analog Scaling tasks are experiments in which listeners hear sounds across a speech continuum and give a numeric rating between 0 and 100 conveying whether the sound they heard was more like word "a" or more like word "b" (i.e. each participant is giving a continuous categorization response). By taking all the continuous categorization responses across the speech continuum, a parametric curve model can be fit to the data and used to analyze any individual's response pattern by speech continuum. Standard statistical modeling techniques are not able to accommodate all of the specific requirements needed to analyze these data. Thus, Bayesian hierarchical modeling techniques are employed to accommodate group-level non-linear curves, individual-specific non-linear curves, continuum-level random effects, and a subject-specific variance that is predicted by other model parameters. In this paper, a Bayesian hierarchical model is constructed to model the data from a Visual Analog Scaling task study of mono-lingual and bi-lingual participants. Any nonlinear curve function could be used and we demonstrate the technique using the 4-parameter logistic function. Overall, the model was found to fit particularly well to the data from the study and results suggested that the magnitude of the slope was what most defined the differences in response patterns between continua.
Collapse
Affiliation(s)
- Eldon Sorensen
- Department of Biostatistics, University of Iowa, Iowa City, IA, USA
| | - Jacob Oleson
- Department of Biostatistics, University of Iowa, Iowa City, IA, USA
| | - Ethan Kutlu
- Department of Psychological and Brain Sciences, University of Iowa, Iowa City, IA, USA
- Department of Linguistics, University of Iowa, Iowa City, IA, USA
| | - Bob McMurray
- Department of Psychological and Brain Sciences, University of Iowa, Iowa City, IA, USA
- Department of Linguistics, University of Iowa, Iowa City, IA, USA
| |
Collapse
|
4
|
Rizzi R, Bidelman GM. Functional benefits of continuous vs. categorical listening strategies on the neural encoding and perception of noise-degraded speech. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.15.594387. [PMID: 38798410 PMCID: PMC11118460 DOI: 10.1101/2024.05.15.594387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Acoustic information in speech changes continuously, yet listeners form discrete perceptual categories to ease the demands of perception. Being a more continuous/gradient as opposed to a discrete/categorical listener may be further advantageous for understanding speech in noise by increasing perceptual flexibility and resolving ambiguity. The degree to which a listener's responses to a continuum of speech sounds are categorical versus continuous can be quantified using visual analog scaling (VAS) during speech labeling tasks. Here, we recorded event-related brain potentials (ERPs) to vowels along an acoustic-phonetic continuum (/u/ to /a/) while listeners categorized phonemes in both clean and noise conditions. Behavior was assessed using standard two alternative forced choice (2AFC) and VAS paradigms to evaluate categorization under task structures that promote discrete (2AFC) vs. continuous (VAS) hearing, respectively. Behaviorally, identification curves were steeper under 2AFC vs. VAS categorization but were relatively immune to noise, suggesting robust access to abstract, phonetic categories even under signal degradation. Behavioral slopes were positively correlated with listeners' QuickSIN scores, suggesting a behavioral advantage for speech in noise comprehension conferred by gradient listening strategy. At the neural level, electrode level data revealed P2 peak amplitudes of the ERPs were modulated by task and noise; responses were larger under VAS vs. 2AFC categorization and showed larger noise-related delay in latency in the VAS vs. 2AFC condition. More gradient responders also had smaller shifts in ERP latency with noise, suggesting their neural encoding of speech was more resilient to noise degradation. Interestingly, source-resolved ERPs showed that more gradient listening was also correlated with stronger neural responses in left superior temporal gyrus. Our results demonstrate that listening strategy (i.e., being a discrete vs. continuous listener) modulates the categorical organization of speech and behavioral success, with continuous/gradient listening being more advantageous to speech in noise perception.
Collapse
|
5
|
Sarrett ME, Toscano JC. Decoding speech sounds from neurophysiological data: Practical considerations and theoretical implications. Psychophysiology 2024; 61:e14475. [PMID: 37947235 DOI: 10.1111/psyp.14475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Revised: 09/13/2023] [Accepted: 10/04/2023] [Indexed: 11/12/2023]
Abstract
Machine learning techniques have proven to be a useful tool in cognitive neuroscience. However, their implementation in scalp-recorded electroencephalography (EEG) is relatively limited. To address this, we present three analyses using data from a previous study that examined event-related potential (ERP) responses to a wide range of naturally-produced speech sounds. First, we explore which features of the EEG signal best maximize machine learning accuracy for a voicing distinction, using a support vector machine (SVM). We manipulate three dimensions of the EEG signal as input to the SVM: number of trials averaged, number of time points averaged, and polynomial fit. We discuss the trade-offs in using different feature sets and offer some recommendations for researchers using machine learning. Next, we use SVMs to classify specific pairs of phonemes, finding that we can detect differences in the EEG signal that are not otherwise detectable using conventional ERP analyses. Finally, we characterize the timecourse of phonetic feature decoding across three phonological dimensions (voicing, manner of articulation, and place of articulation), and find that voicing and manner are decodable from neural activity, whereas place of articulation is not. This set of analyses addresses both practical considerations in the application of machine learning to EEG, particularly for speech studies, and also sheds light on current issues regarding the nature of perceptual representations of speech.
Collapse
Affiliation(s)
- McCall E Sarrett
- Department of Psychological and Brain Sciences, Villanova University, Villanova, Pennsylvania, USA
- Psychology Department, Gonzaga University, Spokane, Washington, USA
| | - Joseph C Toscano
- Department of Psychological and Brain Sciences, Villanova University, Villanova, Pennsylvania, USA
| |
Collapse
|
6
|
Langus A, Boll-Avetisyan N, van Ommen S, Nazzi T. Music and language in the crib: Early cross-domain effects of experience on categorical perception of prominence in spoken language. Dev Sci 2023; 26:e13383. [PMID: 36869433 DOI: 10.1111/desc.13383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 02/12/2023] [Accepted: 02/13/2023] [Indexed: 03/05/2023]
Abstract
Rhythm perception helps young infants find structure in both speech and music. However, it remains unknown whether categorical perception of suprasegmental linguistic rhythm signaled by a co-variation of multiple acoustic cues can be modulated by prior between- (music) and within-domain (language) experience. Here we tested 6-month-old German-learning infants' ability to have a categorical perception of lexical stress, a linguistic prominence signaled through the co-variation of pitch, intensity, and duration. By measuring infants' pupil size, we find that infants as a group fail to perceive co-variation of these acoustic cues as categorical. However, at an individual level, infants with above-average exposure to music and language at home succeeded. Our results suggest that early exposure to music and infant-directed language can boost the categorical perception of prominence. RESEARCH HIGHLIGHTS: 6-month-old German-learning infants' ability to perceive lexical stress prominence categorically depends on exposure to music and language at home. Infants with high exposure to music show categorical perception. Infants with high exposure to infant-directed language show categorical perception. Co-influence of high exposure to music and infant-directed language may be especially beneficial for categorical perception. Early exposure to predictable rhythms boosts categorical perception of prominence.
Collapse
Affiliation(s)
- Alan Langus
- Cognitive Sciences, Department of Linguistics, University of Potsdam, Potsdam, Germany
| | | | | | - Thierry Nazzi
- Integrative Neuroscience and Cognition Center, CNRS - Université Paris Cité, Paris, France
| |
Collapse
|
7
|
Jesus LMT, Ferreira JFS, Ferreira AJS. Identification of words in whispered speech: The role of cues to fricatives' place and voicing. JASA EXPRESS LETTERS 2023; 3:085204. [PMID: 37555774 DOI: 10.1121/10.0020302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 07/10/2023] [Indexed: 08/10/2023]
Abstract
The temporal distribution of acoustic cues in whispered speech was analyzed using the gating paradigm. Fifteen Portuguese participants listened to real disyllabic words produced by four Portuguese speakers. Lexical choices, confidence scores, isolation points (IPs), and recognition points (RPs) were analyzed. Mixed effects models predicted that the first syllable and 70% of the total duration of the second syllable were needed for lexical choices to be above chance level. Fricatives' place, not voicing, had a significant effect on the percentage of correctly identified words. IP and RP values of words with postalveolar voiced and voiceless fricatives were significantly different.
Collapse
Affiliation(s)
- Luis M T Jesus
- Intelligent Systems Associate Laboratory (LASI), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), School of Health Sciences (ESSUA), University of Aveiro, Aveiro, Portugal
| | - Joana F S Ferreira
- Institut fur Romanische Sprachen und Literaturen, Goethe Universitat, Frankfurt, Germany
| | - Aníbal J S Ferreira
- Department of Electrical and Computer Engineering, University of Porto, Porto, , ,
| |
Collapse
|
8
|
Whitling S, Botzum HM, van Mersbergen MR. Degree of Breathiness in a Synthesized Voice Signal as it Differentiates Masculine versus Feminine Voices. J Voice 2023:S0892-1997(23)00150-9. [PMID: 37280147 DOI: 10.1016/j.jvoice.2023.04.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 04/27/2023] [Accepted: 04/27/2023] [Indexed: 06/08/2023]
Abstract
INTRODUCTION Most studies determining speakers' perceived gender as binarily female or male are reliant on F0 perception, although other vocal parameters may also contribute to the perception of gender. The current study focused on the impact of breathiness on the perception of speakers' gender as a biological variable (feminine or masculine). METHODS n = 31 normal hearing, native English speakers, 18 female, 13 male, mean age 23 (SD = 3.54), were auditorily and visually trained in and then took part in a categorical perception task. A continuum of nine samples of the word "hello", was created in an airway modulation model of speech and voice production. Resting vocal fold length, resting vocal fold thickness, F0, and vocal tract length were fixed. Glottal width at the vocal process, posterior glottal gap, and bronchial pressure were continually modified for all stimuli. Each stimulus was randomly presented 30 times within each of the five blocks (150 presentations in total). Participants rated stimuli as binarily female or male. RESULTS Showed a sigmoidal shift in breathiness along the continuum between perceived feminine or masculine voicing. This shift was evident at stimuli four and five, indicating a nonlinear, discrete perception of breathiness among participants. Response times were also significantly slower in these two stimuli, suggesting a categorical perception of breathiness among participants. CONCLUSION Breathiness created by the change in glottal width of at least 0.21 cm may influence the perception of a speaker's perceived gender.
Collapse
Affiliation(s)
- Susanna Whitling
- Department of Logopedics, Phoniatrics and Audiology, Lund University, Lund, Sweden.
| | | | | |
Collapse
|
9
|
Zhang W, Gu W. F0 range instead of F0 slope is the primary cue for the falling tone of Mandarin. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:3439. [PMID: 37354204 DOI: 10.1121/10.0019712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 05/30/2023] [Indexed: 06/26/2023]
Abstract
It has been well known that rising/falling pitch is employed to distinguish the rising (R) or falling (F) tones from the high-level (H) tone in Mandarin, but whether F0 range or F0 slope is the more critical F0 cue to perception is still inconclusive. To clarify this issue quantitatively, we took the F tone as the test case, and conducted two-alternative forced choice identification tests on two types of two-dimensional high-level-falling (H-F) tonal continua, one of which was manipulated along F0 range and duration ("F0 range continuum") while the other along F0 slope and duration ("F0 slope continuum"). Experimental results indicated that F0 range was the primary cue because it resulted in a more robust (less duration-dependent) perceptual boundary than F0 slope. Meanwhile, the perceptual boundary in F0 range was not fully independent of but mildly modulated by duration, suggesting that duration (or equivalently, F0 slope) played a supplementary role in identifying the H-F tonal contrast.
Collapse
Affiliation(s)
- Wei Zhang
- Department of Linguistics, McGill University, Montreal, Quebec H3A 1A7, Canada
| | - Wentao Gu
- School of Chinese Language and Literature, Nanjing Normal University, Nanjing, Jiangsu 210097, China
| |
Collapse
|
10
|
Winn MB, Wright RA, Tucker BV. Reconsidering classic ideas in speech communication. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:1623. [PMID: 37002094 DOI: 10.1121/10.0017487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Accepted: 02/20/2023] [Indexed: 05/18/2023]
Abstract
The papers in this special issue provide a critical look at some historical ideas that have had an influence on research and teaching in the field of speech communication. They also address widely used methodologies or address long-standing methodological challenges in the areas of speech perception and speech production. The goal is to reconsider and evaluate the need for caution or replacement of historical ideas with more modern results and methods. The contributions provide respectful historical context to the classic ideas, as well as new original research or discussion that clarifies the limitations of the original ideas.
Collapse
Affiliation(s)
- Matthew B Winn
- Speech-Language-Hearing Sciences, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | - Richard A Wright
- Department of Linguistics, University of Washington, Seattle, Washington 98195, USA
| | - Benjamin V Tucker
- Department of Communication Sciences and Disorders, Northern Arizona University, Flagstaff, Arizona 86011, USA
| |
Collapse
|
11
|
Apfelbaum KS, Kutlu E, McMurray B, Kapnoula EC. Don't force it! Gradient speech categorization calls for continuous categorization tasks. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:3728. [PMID: 36586841 PMCID: PMC9894657 DOI: 10.1121/10.0015201] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 09/12/2022] [Accepted: 10/20/2022] [Indexed: 05/29/2023]
Abstract
Research on speech categorization and phoneme recognition has relied heavily on tasks in which participants listen to stimuli from a speech continuum and are asked to either classify each stimulus (identification) or discriminate between them (discrimination). Such tasks rest on assumptions about how perception maps onto discrete responses that have not been thoroughly investigated. Here, we identify critical challenges in the link between these tasks and theories of speech categorization. In particular, we show that patterns that have traditionally been linked to categorical perception could arise despite continuous underlying perception and that patterns that run counter to categorical perception could arise despite underlying categorical perception. We describe an alternative measure of speech perception using a visual analog scale that better differentiates between processes at play in speech categorization, and we review some recent findings that show how this task can be used to better inform our theories.
Collapse
Affiliation(s)
- Keith S Apfelbaum
- Department of Psychological and Brain Sciences, G60 Psychological and Brain Sciences Building, University of Iowa, Iowa City, Iowa 52242-1407, USA
| | - Ethan Kutlu
- Department of Psychological and Brain Sciences, G60 Psychological and Brain Sciences Building, University of Iowa, Iowa City, Iowa 52242-1407, USA
| | - Bob McMurray
- Department of Psychological and Brain Sciences, G60 Psychological and Brain Sciences Building, University of Iowa, Iowa City, Iowa 52242-1407, USA
| | - Efthymia C Kapnoula
- BCBL, Basque Center on Cognition, Brain and Language, Mikeletegi 69, 20009 Donostia, Spain
| |
Collapse
|