1
|
Song Y, Robb MP, Chen Y. Estimates of Speech Efficiency in Monolingual and Bilingual Speakers of English. Folia Phoniatr Logop 2024:1-10. [PMID: 39084203 DOI: 10.1159/000540671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Accepted: 07/29/2024] [Indexed: 08/02/2024] Open
Abstract
INTRODUCTION The Speech Efficiency Score (SES) serves as an acoustic metric for assessing fluency in conversational speech within the temporal domain. This study leverages SES to investigate conversational speech efficiency among native speakers of American English (AE) compared to speakers of Mandarin-accented English (MAE). METHODS SES, speaking rate, articulation rate, and vocabulary diversity were measured and compared between two groups: native AE speakers and MAE speakers. The study utilized conversational speech samples collected from both groups to analyze these metrics. RESULTS Findings indicate a disparity in speaking rate and articulation rate between the AE and MAE groups, with the AE group exhibiting significantly faster speech. However, no significant differences were observed in SES and vocabulary diversity between the two groups. CONCLUSION The results are discussed in the context of the interplay between speaking rate, speech fluency, and vocabulary diversity. These findings shed light on the maintenance of speech efficiency among bilingual speakers, suggesting that despite differences in speaking rate and articulation rate, SES and vocabulary diversity remain comparable between native AE speakers and MAE speakers.
Collapse
Affiliation(s)
- Yuting Song
- Duquesne-China Health Institute, Duquesne University, Pittsburgh, Pennsylvania, USA
- Department of Speech Language Pathology, School of Rehabilitation, Kunming Medical University, Kunming, China
| | - Michael P Robb
- Department of Communication Sciences and Disorders, Pennsylvania State University, University Park, Pennsylvania, USA
- Faculty of Health Sciences, University of Canterbury, Christchurch, New Zealand
| | - Yang Chen
- Duquesne-China Health Institute, Duquesne University, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
2
|
Xie X, Kurumada C. From first encounters to longitudinal exposure: a repeated exposure-test paradigm for monitoring speech adaptation. Front Psychol 2024; 15:1383904. [PMID: 38873525 PMCID: PMC11169900 DOI: 10.3389/fpsyg.2024.1383904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 05/08/2024] [Indexed: 06/15/2024] Open
Abstract
Perceptual difficulty with an unfamiliar accent can dissipate within short time scales (e.g., within minutes), reflecting rapid adaptation effects. At the same time, long-term familiarity with an accent is also known to yield stable perceptual benefits. However, whether the long-term effects reflect sustained, cumulative progression from shorter-term adaptation remains unknown. To fill this gap, we developed a web-based, repeated exposure-test paradigm. In this paradigm, short test blocks alternate with exposure blocks, and this exposure-test sequence is repeated multiple times. This design allows for the testing of adaptive speech perception both (a) within the first moments of encountering an unfamiliar accent and (b) over longer time scales such as days and weeks. In addition, we used a Bayesian ideal observer approach to select natural speech stimuli that increase the statistical power to detect adaptation. The current report presents results from a first application of this paradigm, investigating changes in the recognition accuracy of Mandarin-accented speech by native English listeners over five sessions spanning 3 weeks. We found that the recognition of an accent feature (a syllable-final /d/, as in feed, sounding/t/-like) improved steadily over the three-week period. Unexpectedly, however, the improvement was seen with or without exposure to the accent. We discuss possible reasons for this result and implications for conducting future longitudinal studies with repeated exposure and testing.
Collapse
Affiliation(s)
- Xin Xie
- Department of Language Science, University of California, Irvine, Irvine, CA, United States
| | - Chigusa Kurumada
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, NY, United States
| |
Collapse
|
3
|
Bieber R, Gordon-Salant S. Influence of talker and accent variability on rapid adaptation and generalization to non-native accented speech in younger and older adults. AUDITORY PERCEPTION & COGNITION 2024; 7:110-139. [PMID: 39149599 PMCID: PMC11323066 DOI: 10.1080/25742442.2024.2345568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 04/14/2024] [Indexed: 08/17/2024]
Abstract
Introduction Listeners can rapidly adapt to English speech produced by non-native speakers of English with unfamiliar accents. Prior work has shown that the type and number of talkers contained within a stimulus set may impact rate and magnitude of learning, as well as any generalization of learning. However, findings across the literature have been inconsistent, with relatively little study of these effects in populations of older listeners. Methods In this study, adaptation and generalization to unfamiliar talkers with familiar and unfamiliar accents are studied in younger normal-hearing adults and older adults with and without hearing loss. Rate and magnitude of adaptation are modelled using both generalized linear mixed effects regression and generalized additive mixed effects modelling. Results Rate and magnitude of adaptation were not impacted by increasing the number of talkers and/or varying the consistency of non-native English accents across talkers. Increasing the number of talkers did strengthen generalization of learning for a talker with a familiar non-native accent, but not for an unfamiliar accent. Aging alone did not diminish adaptation or generalization. Discussion These findings support prior evidence of a limited benefit for talker variability in facilitating generalization of learning for non-native accented speech, and extend the findings to older adults.
Collapse
Affiliation(s)
- R.E. Bieber
- Department of Hearing and Speech Sciences, University of Maryland College Park, College Park MD, US
| | - S. Gordon-Salant
- Department of Hearing and Speech Sciences, University of Maryland College Park, College Park MD, US
| |
Collapse
|
4
|
Persson A, Jaeger TF. Evaluating normalization accounts against the dense vowel space of Central Swedish. Front Psychol 2023; 14:1165742. [PMID: 37416548 PMCID: PMC10322199 DOI: 10.3389/fpsyg.2023.1165742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 05/23/2023] [Indexed: 07/08/2023] Open
Abstract
Talkers vary in the phonetic realization of their vowels. One influential hypothesis holds that listeners overcome this inter-talker variability through pre-linguistic auditory mechanisms that normalize the acoustic or phonetic cues that form the input to speech recognition. Dozens of competing normalization accounts exist-including both accounts specific to vowel perception and general purpose accounts that can be applied to any type of cue. We add to the cross-linguistic literature on this matter by comparing normalization accounts against a new phonetically annotated vowel database of Swedish, a language with a particularly dense vowel inventory of 21 vowels differing in quality and quantity. We evaluate normalization accounts on how they differ in predicted consequences for perception. The results indicate that the best performing accounts either center or standardize formants by talker. The study also suggests that general purpose accounts perform as well as vowel-specific accounts, and that vowel normalization operates in both temporal and spectral domains.
Collapse
Affiliation(s)
- Anna Persson
- Department of Swedish Language and Multilingualism, Stockholm University, Stockholm, Sweden
| | - T. Florian Jaeger
- Brain and Cognitive Sciences, University of Rochester, Rochester, NY, United States
- Computer Science, University of Rochester, Rochester, NY, United States
| |
Collapse
|
5
|
Kapadia AM, Tin JAA, Perrachione TK. Multiple sources of acoustic variation affect speech processing efficiency. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:209. [PMID: 36732274 PMCID: PMC9836727 DOI: 10.1121/10.0016611] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 11/14/2022] [Accepted: 12/07/2022] [Indexed: 05/29/2023]
Abstract
Phonetic variability across talkers imposes additional processing costs during speech perception, evident in performance decrements when listening to speech from multiple talkers. However, within-talker phonetic variation is a less well-understood source of variability in speech, and it is unknown how processing costs from within-talker variation compare to those from between-talker variation. Here, listeners performed a speeded word identification task in which three dimensions of variability were factorially manipulated: between-talker variability (single vs multiple talkers), within-talker variability (single vs multiple acoustically distinct recordings per word), and word-choice variability (two- vs six-word choices). All three sources of variability led to reduced speech processing efficiency. Between-talker variability affected both word-identification accuracy and response time, but within-talker variability affected only response time. Furthermore, between-talker variability, but not within-talker variability, had a greater impact when the target phonological contrasts were more similar. Together, these results suggest that natural between- and within-talker variability reflect two distinct magnitudes of common acoustic-phonetic variability: Both affect speech processing efficiency, but they appear to have qualitatively and quantitatively unique effects due to differences in their potential to obscure acoustic-phonemic correspondences across utterances.
Collapse
Affiliation(s)
- Alexandra M Kapadia
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Jessica A A Tin
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Tyler K Perrachione
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| |
Collapse
|
6
|
Nenadić F, Tucker BV, Ten Bosch L. Computational Modeling of an Auditory Lexical Decision Experiment Using DIANA. LANGUAGE AND SPEECH 2022:238309221111752. [PMID: 36000386 PMCID: PMC10394956 DOI: 10.1177/00238309221111752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
We present an implementation of DIANA, a computational model of spoken word recognition, to model responses collected in the Massive Auditory Lexical Decision (MALD) project. DIANA is an end-to-end model, including an activation and decision component that takes the acoustic signal as input, activates internal word representations, and outputs lexicality judgments and estimated response latencies. Simulation 1 presents the process of creating acoustic models required by DIANA to analyze novel speech input. Simulation 2 investigates DIANA's performance in determining whether the input signal is a word present in the lexicon or a pseudoword. In Simulation 3, we generate estimates of response latency and correlate them with general tendencies in participant responses in MALD data. We find that DIANA performs fairly well in free word recognition and lexical decision. However, the current approach for estimating response latency provides estimates opposite to those found in behavioral data. We discuss these findings and offer suggestions as to what a contemporary model of spoken word recognition should be able to do.
Collapse
Affiliation(s)
- Filip Nenadić
- University of Alberta, Canada; Singidunum University, Serbia
| | | | | |
Collapse
|
7
|
Bieber RE, Gordon-Salant S. Semantic context and stimulus variability independently affect rapid adaptation to non-native English speech in young adults. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:242. [PMID: 35104999 PMCID: PMC8769767 DOI: 10.1121/10.0009170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 11/26/2021] [Accepted: 12/07/2021] [Indexed: 06/14/2023]
Abstract
When speech is degraded or challenging to recognize, young adult listeners with normal hearing are able to quickly adapt, improving their recognition of the speech over a short period of time. This rapid adaptation is robust, but the factors influencing rate, magnitude, and generalization of improvement have not been fully described. Two factors of interest are lexico-semantic information and talker and accent variability; lexico-semantic information promotes perceptual learning for acoustically ambiguous speech, while talker and accent variability are beneficial for generalization of learning. In the present study, rate and magnitude of adaptation were measured for speech varying in level of semantic context, and in the type and number of talkers. Generalization of learning to an unfamiliar talker was also assessed. Results indicate that rate of rapid adaptation was slowed for semantically anomalous sentences, as compared to semantically intact or topic-grouped sentences; however, generalization was seen in the anomalous conditions. Magnitude of adaptation was greater for non-native as compared to native talker conditions, with no difference between single and multiple non-native talker conditions. These findings indicate that the previously documented benefit of lexical information in supporting rapid adaptation is not enhanced by the addition of supra-sentence context.
Collapse
Affiliation(s)
- Rebecca E Bieber
- Department of Hearing and Speech Sciences, University of Maryland College Park, College Park, Maryland 20742, USA
| | - Sandra Gordon-Salant
- Department of Hearing and Speech Sciences, University of Maryland College Park, College Park, Maryland 20742, USA
| |
Collapse
|
8
|
Tan M, Xie X, Jaeger TF. Using Rational Models to Interpret the Results of Experiments on Accent Adaptation. Front Psychol 2021; 12:676271. [PMID: 34803790 PMCID: PMC8603310 DOI: 10.3389/fpsyg.2021.676271] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Accepted: 09/14/2021] [Indexed: 11/14/2022] Open
Abstract
Exposure to unfamiliar non-native speech tends to improve comprehension. One hypothesis holds that listeners adapt to non-native-accented speech through distributional learning—by inferring the statistics of the talker's phonetic cues. Models based on this hypothesis provide a good fit to incremental changes after exposure to atypical native speech. These models have, however, not previously been applied to non-native accents, which typically differ from native speech in many dimensions. Motivated by a seeming failure to replicate a well-replicated finding from accent adaptation, we use ideal observers to test whether our results can be understood solely based on the statistics of the relevant cue distributions in the native- and non-native-accented speech. The simple computational model we use for this purpose can be used predictively by other researchers working on similar questions. All code and data are shared.
Collapse
Affiliation(s)
- Maryann Tan
- Centre for Research on Bilingualism, Department of Swedish Language & Multilingualism, Stockholm University, Stockholm, Sweden.,Brain & Cognitive Sciences, University of Rochester, Rochester, NY, United States
| | - Xin Xie
- Brain & Cognitive Sciences, University of Rochester, Rochester, NY, United States.,Department of Language Science, University of California, Irvine, Irvine, CA, United States
| | - T Florian Jaeger
- Brain & Cognitive Sciences, University of Rochester, Rochester, NY, United States.,Computer Science, University of Rochester, Rochester, NY, United States
| |
Collapse
|
9
|
Korvel G, Treigys P, Kostek B. Highlighting interlanguage phoneme differences based on similarity matrices and convolutional neural network. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:508. [PMID: 33514128 DOI: 10.1121/10.0003339] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Accepted: 12/22/2020] [Indexed: 06/12/2023]
Abstract
The goal of this research is to find a way of highlighting the acoustic differences between consonant phonemes of the Polish and Lithuanian languages. For this purpose, similarity matrices are employed based on speech acoustic parameters combined with a convolutional neural network (CNN). In the first experiment, we compare the effectiveness of the similarity matrices applied to discerning acoustic differences between consonant phonemes of the Polish and Lithuanian languages. The similarity matrices built on both an extensive set of parameters and a reduced set after removing high-correlated parameters are used. The results show that higher accuracy is obtained by the similarity matrices without discarding high-correlated parameters. In the second experiment, the averaged accuracies of the similarity matrices obtained are compared with the results provided by spectrograms combined with CNN, as well as the results of the vectors containing acoustic parameters and two baseline classifiers, namely k-nearest neighbors and support vector machine. The performance of the similarity matrix approach demonstrates its superiority over the methods used for comparison.
Collapse
Affiliation(s)
- Gražina Korvel
- Institute of Data Science and Digital Technologies, Vilnius University, Vilnius, Lithuania
| | - Povilas Treigys
- Institute of Data Science and Digital Technologies, Vilnius University, Vilnius, Lithuania
| | - Bożena Kostek
- Audio Acoustics Laboratory, Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology, Gdańsk, Poland
| |
Collapse
|