1
|
Ueda K, Hashimoto M, Takeichi H, Wakamiya K. Interrupted mosaic speech revisited: Gain and loss in intelligibility by stretchinga). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:1767-1779. [PMID: 38441439 DOI: 10.1121/10.0025132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 02/16/2024] [Indexed: 03/07/2024]
Abstract
Our previous investigation on the effect of stretching spectrotemporally degraded and temporally interrupted speech stimuli showed remarkable intelligibility gains [Udea, Takeichi, and Wakamiya (2022). J. Acoust. Soc. Am. 152(2), 970-980]. In this previous study, however, gap durations and temporal resolution were confounded. In the current investigation, we therefore observed the intelligibility of so-called mosaic speech while dissociating the effects of interruption and temporal resolution. The intelligibility of mosaic speech (20 frequency bands and 20 ms segment duration) declined from 95% to 78% and 33% by interrupting it with 20 and 80 ms gaps. Intelligibility improved, however, to 92% and 54% (14% and 21% gains for 20 and 80 ms gaps, respectively) by stretching mosaic segments to fill silent gaps (n = 21). By contrast, the intelligibility was impoverished to a minimum of 9% (7% loss) when stretching stimuli interrupted with 160 ms gaps. Explanations based on auditory grouping, modulation unmasking, or phonemic restoration may account for the intelligibility improvement by stretching, but not for the loss. The probability summation model accounted for "U"-shaped intelligibility curves and the gain and loss of intelligibility, suggesting that perceptual unit length and speech rate may affect the intelligibility of spectrotemporally degraded speech stimuli.
Collapse
Affiliation(s)
- Kazuo Ueda
- Department of Acoustic Design, Faculty of Design/Research Center for Applied Perceptual Science/Research and Development Center for Five-Sense Devices, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| | - Masashi Hashimoto
- Department of Acoustic Design, Faculty of Design, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| | - Hiroshige Takeichi
- Open Systems Information Science Team, Advanced Data Science Project (ADSP), RIKEN Information R&D and Strategy Headquarters (R-IH), RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Kohei Wakamiya
- Department of Acoustic Design, Faculty of Design, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| |
Collapse
|
2
|
Ueda K, Doan LLD, Takeichi H. Checkerboard and interrupted speech: Intelligibility contrasts related to factor-analysis-based frequency bandsa). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:2010-2020. [PMID: 37782122 DOI: 10.1121/10.0021165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 09/08/2023] [Indexed: 10/03/2023]
Abstract
It has been shown that the intelligibility of checkerboard speech stimuli, in which speech signals were periodically interrupted in time and frequency, drastically varied according to the combination of the number of frequency bands (2-20) and segment duration (20-320 ms). However, the effects of the number of frequency bands between 4 and 20 and the frequency division parameters on intelligibility have been largely unknown. Here, we show that speech intelligibility was lowest in four-band checkerboard speech stimuli, except for the 320-ms segment duration. Then, temporally interrupted speech stimuli and eight-band checkerboard speech stimuli came in this order (N = 19 and 20). At the same time, U-shaped intelligibility curves were observed for four-band and possibly eight-band checkerboard speech stimuli. Furthermore, different parameters of frequency division resulted in small but significant intelligibility differences at the 160- and 320-ms segment duration in four-band checkerboard speech stimuli. These results suggest that factor-analysis-based four frequency bands, representing groups of critical bands correlating with each other in speech power fluctuations, work as speech cue channels essential for speech perception. Moreover, a probability summation model for perceptual units, consisting of a sub-unit process and a supra-unit process that receives outputs of the speech cue channels, may account for the U-shaped intelligibility curves.
Collapse
Affiliation(s)
- Kazuo Ueda
- Department of Acoustic Design, Faculty of Design/Research Center for Applied Perceptual Science/Research and Development Center for Five-Sense Devices, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| | - Linh Le Dieu Doan
- Human Science Course, Graduate School of Design, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| | - Hiroshige Takeichi
- Open Systems Information Science Team, Advanced Data Science Project (ADSP), RIKEN Information R&D and Strategy Headquarters (R-IH), RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| |
Collapse
|
3
|
Murai SA, Riquimaroux H. Long-term changes in cortical representation through perceptual learning of spectrally degraded speech. J Comp Physiol A Neuroethol Sens Neural Behav Physiol 2023; 209:163-172. [PMID: 36464716 DOI: 10.1007/s00359-022-01593-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Revised: 11/07/2022] [Accepted: 11/08/2022] [Indexed: 12/07/2022]
Abstract
Listeners can adapt to acoustically degraded speech with perceptual training. The learning processes for long periods underlies the rehabilitation of patients with hearing aids or cochlear implants. Perceptual learning of acoustically degraded speech has been associated with the frontotemporal cortices. However, neural processes during and after long-term perceptual learning remain unclear. Here we conducted perceptual training of noise-vocoded speech sounds (NVSS), which is spectrally degraded signals, and measured the cortical activity for seven days and the follow up testing (approximately 1 year later) to investigate changes in neural activation patterns using functional magnetic resonance imaging. We demonstrated that young adult participants (n = 5) improved their performance across seven experimental days, and the gains were maintained after 10 months or more. Representational similarity analysis showed that the neural activation patterns of NVSS relative to clear speech in the left posterior superior temporal sulcus (pSTS) were significantly different across seven training days, accompanying neural changes in frontal cortices. In addition, the distinct activation patterns to NVSS in the frontotemporal cortices were also observed 10-13 months after the training. We, therefore, propose that perceptual training can induce plastic changes and long-term effects on neural representations of the trained degraded speech in the frontotemporal cortices. These behavioral improvements and neural changes induced by the perceptual learning of degraded speech will provide insights into cortical mechanisms underlying adaptive processes in difficult listening situations and long-term rehabilitation of auditory disorders.
Collapse
Affiliation(s)
- Shota A Murai
- Faculty of Life and Medical Sciences, Doshisha University, 1-3 Miyakodani, Tatara, Kyotanabe, Kyoto, 610-0321, Japan.,International Research Center for Neurointelligence (WPI-IRCN), The University of Tokyo Institutes for Advanced Study, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan
| | - Hiroshi Riquimaroux
- Faculty of Life and Medical Sciences, Doshisha University, 1-3 Miyakodani, Tatara, Kyotanabe, Kyoto, 610-0321, Japan.
| |
Collapse
|
4
|
Ueda K, Takeichi H, Wakamiya K. Auditory grouping is necessary to understand interrupted mosaic speech stimuli. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:970. [PMID: 36050149 PMCID: PMC9553289 DOI: 10.1121/10.0013425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Revised: 07/13/2022] [Accepted: 07/21/2022] [Indexed: 06/15/2023]
Abstract
The intelligibility of interrupted speech stimuli has been known to be almost perfect when segment duration is shorter than 80 ms, which means that the interrupted segments are perceptually organized into a coherent stream under this condition. However, why listeners can successfully group the interrupted segments into a coherent stream has been largely unknown. Here, we show that the intelligibility for mosaic speech in which original speech was segmented in frequency and time and noise-vocoded with the average power in each unit was largely reduced by periodical interruption. At the same time, the intelligibility could be recovered by promoting auditory grouping of the interrupted segments by stretching the segments up to 40 ms and reducing the gaps, provided that the number of frequency bands was enough ( ≥ 4) and the original segment duration was equal to or less than 40 ms. The interruption was devastating for mosaic speech stimuli, very likely because the deprivation of periodicity and temporal fine structure with mosaicking prevented successful auditory grouping for the interrupted segments.
Collapse
Affiliation(s)
- Kazuo Ueda
- Department of Human Science, Faculty of Design/Research Center for Applied Perceptual Science/Research and Development Center for Five-Sense Devices, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| | - Hiroshige Takeichi
- Open Systems Information Science Team, Advanced Data Science Project (ADSP), RIKEN Information Research and Development and Strategy Headquarters (R-IH), RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Kohei Wakamiya
- Department of Communication Design Science, Faculty of Design, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| |
Collapse
|
5
|
Bhandari P, Demberg V, Kray J. Semantic Predictability Facilitates Comprehension of Degraded Speech in a Graded Manner. Front Psychol 2021; 12:714485. [PMID: 34566795 PMCID: PMC8459870 DOI: 10.3389/fpsyg.2021.714485] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 08/06/2021] [Indexed: 01/02/2023] Open
Abstract
Previous studies have shown that at moderate levels of spectral degradation, semantic predictability facilitates language comprehension. It is argued that when speech is degraded, listeners have narrowed expectations about the sentence endings; i.e., semantic prediction may be limited to only most highly predictable sentence completions. The main objectives of this study were to (i) examine whether listeners form narrowed expectations or whether they form predictions across a wide range of probable sentence endings, (ii) assess whether the facilitatory effect of semantic predictability is modulated by perceptual adaptation to degraded speech, and (iii) use and establish a sensitive metric for the measurement of language comprehension. For this, we created 360 German Subject-Verb-Object sentences that varied in semantic predictability of a sentence-final target word in a graded manner (high, medium, and low) and levels of spectral degradation (1, 4, 6, and 8 channels noise-vocoding). These sentences were presented auditorily to two groups: One group (n =48) performed a listening task in an unpredictable channel context in which the degraded speech levels were randomized, while the other group (n =50) performed the task in a predictable channel context in which the degraded speech levels were blocked. The results showed that at 4 channels noise-vocoding, response accuracy was higher in high-predictability sentences than in the medium-predictability sentences, which in turn was higher than in the low-predictability sentences. This suggests that, in contrast to the narrowed expectations view, comprehension of moderately degraded speech, ranging from low- to high- including medium-predictability sentences, is facilitated in a graded manner; listeners probabilistically preactivate upcoming words from a wide range of semantic space, not limiting only to highly probable sentence endings. Additionally, in both channel contexts, we did not observe learning effects; i.e., response accuracy did not increase over the course of experiment, and response accuracy was higher in the predictable than in the unpredictable channel context. We speculate from these observations that when there is no trial-by-trial variation of the levels of speech degradation, listeners adapt to speech quality at a long timescale; however, when there is a trial-by-trial variation of the high-level semantic feature (e.g., sentence predictability), listeners do not adapt to low-level perceptual property (e.g., speech quality) at a short timescale.
Collapse
Affiliation(s)
- Pratik Bhandari
- Department of Psychology, Saarland University, Saarbrücken, Germany
- Department of Language Science and Technology, Saarland University, Saarbrücken, Germany
| | - Vera Demberg
- Department of Language Science and Technology, Saarland University, Saarbrücken, Germany
- Department of Computer Science, Saarland University, Saarbrücken, Germany
| | - Jutta Kray
- Department of Psychology, Saarland University, Saarbrücken, Germany
| |
Collapse
|
6
|
Ueda K, Matsuo I. Intelligibility of chimeric locally time-reversed speech: Relative contribution of four frequency bands. JASA EXPRESS LETTERS 2021; 1:065201. [PMID: 36154368 DOI: 10.1121/10.0005439] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Intelligibility of four-band speech stimuli was investigated (n = 18), such that only one of the frequency bands was preserved, whereas other bands were locally time-reversed (segment duration: 75-300 ms), or vice versa. Intelligibility was best retained (82% at 75 ms) when the second lowest band (540-1700 Hz) was preserved. When the same band was degraded, the largest drop (10% at 300 ms) occurred. The lowest and second highest bands contributed similarly less strongly to intelligibility. The highest frequency band contributed least. A close connection between the second lowest frequency band and sonority was suggested.
Collapse
Affiliation(s)
- Kazuo Ueda
- Department of Human Science, Faculty of Design/Research Center for Applied Perceptual Science/Research and Development Center for Five-Sense Devices, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| | - Ikuo Matsuo
- Department of Information Science, Tohoku Gakuin University, 2-1-1 Tenjinzawa, Izumi-ku, Sendai 981-3193, Japan ,
| |
Collapse
|
7
|
Murai SA, Riquimaroux H. Neural correlates of subjective comprehension of noise-vocoded speech. Hear Res 2021; 405:108249. [PMID: 33894680 DOI: 10.1016/j.heares.2021.108249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Revised: 03/28/2021] [Accepted: 04/06/2021] [Indexed: 10/21/2022]
Abstract
Under an acoustically degraded condition, the degree of speech comprehension fluctuates within individuals. Understanding the relationship between such fluctuations in comprehension and neural responses might reveal perceptual processing for distorted speech. In this study we investigated the cerebral activity associated with the degree of subjective comprehension of noise-vocoded speech sounds (NVSS) using functional magnetic resonance imaging. Our results indicate that higher comprehension of NVSS sentences was associated with greater activation in the right superior temporal cortex, and that activity in the left inferior frontal gyrus (Broca's area) was increased when a listener recognized words in a sentence they did not fully comprehend. In addition, results of laterality analysis demonstrated that recognition of words in an NVSS sentence led to less lateralized responses in the temporal cortex, though a left-lateralization was observed when no words were recognized. The data suggest that variation in comprehension within individuals can be associated with changes in lateralization in the temporal auditory cortex.
Collapse
Affiliation(s)
- Shota A Murai
- Faculty of Life and Medical Sciences, Doshisha University. 1-3 Miyakodani, Tatara, Kyotanabe 610-0321, Kyoto, Japan
| | - Hiroshi Riquimaroux
- Faculty of Life and Medical Sciences, Doshisha University. 1-3 Miyakodani, Tatara, Kyotanabe 610-0321, Kyoto, Japan.
| |
Collapse
|