151
|
Broś K, Lipowska K. Gran Canarian Spanish Non-Continuant Voicing: Gradiency, Sex Differences and Perception. PHONETICA 2019; 76:100-125. [PMID: 31112961 DOI: 10.1159/000494928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/03/2017] [Accepted: 10/29/2018] [Indexed: 06/09/2023]
Abstract
BACKGROUND/AIMS This paper examines the process of postvocalic voicing in the Spanish of Gran Canaria from the point of view of language change. A perception-production study was designed to measure the extent of variation in speaker productions, explore the degree to which production is affected by perception and identify variables that can be considered markers of sound change in progress. METHODS 20 native speakers of the dialect were asked to repeat auditory input data containing voiceless non-continuants with and without voicing. RESULTS Input voicing has no effect on output pronunciations, but voicing is highly variable, with both phonetic and social factors involved. Most importantly, a clear lenition pattern was identified based on such indicators as consonant duration, intensity ratio, absence of burst and presence of formants, with the velar /k/ as the most affected segment. Furthermore, strong social implications were identified: voicing degrees and rates depend both on the level of education and on the gender of the speaker. CONCLUSION The results of the study suggest that the interplay of external and internal factors must be investigated more thoroughly to better address the question of phonetic variation and phonologisation of contrasts in the context of language change.
Collapse
|
152
|
Kuznetsova N, Verkhodanova V. Phonetic Realisation and Phonemic Categorisation of the Final Reduced Corner Vowels in the Finnic Languages of Ingria. PHONETICA 2019; 76:201-233. [PMID: 31112960 DOI: 10.1159/000494927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2017] [Accepted: 10/29/2018] [Indexed: 06/09/2023]
Abstract
Individual variability in sound change was explored at three stages of final vowel reduction and loss in the endangered Finnic varieties of Ingria (subdialects of Ingrian, Votic and Ingrian Finnish). The correlation between the realisation of reduced vowels and their phonemic categorisation by speakers was studied. The correlated results showed that if V was pronounced >70%, its starting loss was not yet perceived, apart from certain frequent elements, but after >70% loss, V was not perceived any more. A split of 50/50 between V and loss in production correlated with the same split in categorisation. At the beginning of a sound change, production is, therefore, more innovative, but after reanalysis, categorisation becomes more innovative and leads the change. The vowel a was the most innovative in terms of loss, u/o were the most conservative, and i was in the middle, while consonantal palatalisation was more salient than labialisation. These differences are based on acoustics, articulation and perception.
Collapse
|
153
|
Al-Hameed S, Benaissa M, Christensen H, Mirheidari B, Blackburn D, Reuber M. A new diagnostic approach for the identification of patients with neurodegenerative cognitive complaints. PLoS One 2019; 14:e0217388. [PMID: 31125389 PMCID: PMC6534304 DOI: 10.1371/journal.pone.0217388] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Accepted: 05/11/2019] [Indexed: 11/18/2022] Open
Abstract
Neurodegenerative diseases causing dementia are known to affect a person's speech and language. Part of the expert assessment in memory clinics therefore routinely focuses on detecting such features. The current outpatient procedures examining patients' verbal and interactional abilities mainly focus on verbal recall, word fluency, and comprehension. By capturing neurodegeneration-associated characteristics in a person's voice, the incorporation of novel methods based on the automatic analysis of speech signals may give us more information about a person's ability to interact which could contribute to the diagnostic process. In this proof-of-principle study, we demonstrate that purely acoustic features, extracted from recordings of patients' answers to a neurologist's questions in a specialist memory clinic can support the initial distinction between patients presenting with cognitive concerns attributable to progressive neurodegenerative disorders (ND) or Functional Memory Disorder (FMD, i.e., subjective memory concerns unassociated with objective cognitive deficits or a risk of progression). The study involved 15 FMD and 15 ND patients where a total of 51 acoustic features were extracted from the recordings. Feature selection was used to identify the most discriminating features which were then used to train five different machine learning classifiers to differentiate between the FMD/ND classes, achieving a mean classification accuracy of 96.2%. The discriminative power of purely acoustic approaches could be integrated into diagnostic pathways for patients presenting with memory concerns and are computationally less demanding than methods focusing on linguistic elements of speech and language that require automatic speech recognition and understanding.
Collapse
|
154
|
Palaparthi A, Maxfield L, Titze IR. Estimation of Source-Filter Interaction Regions Based on Electroglottography. J Voice 2019; 33:269-276. [PMID: 29277351 PMCID: PMC6014870 DOI: 10.1016/j.jvoice.2017.11.012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Revised: 11/20/2017] [Accepted: 11/21/2017] [Indexed: 10/18/2022]
Abstract
Source-filter interaction is a phenomenon in which acoustic airway pressures influence the glottal airflow at the source (level 1) and the vibration pattern of the vocal folds (level 2). This interaction is most significant when dominant source harmonics are near airway resonances. The influence of acoustic airway pressures on vocal fold vibration (level 2) was studied systematically by changing the supraglottal vocal tract length in human subjects with tube extensions. The subjects were asked to perform fundamental frequency (fo) glides while phonating through tubes of various lengths. An algorithm was developed using the quasi-open quotient extracted from the electroglottograph. Regions of sudden vocal fold vibration pattern change due to source-filter interaction were inferred from contact area changes. The algorithm correctly identified 89% of male and 84.8% of female quantal changes in contact pattern associated with interactions between source harmonics and formants during ascending glides. During the descending glides, the algorithm correctly identified 84% of male and 81.1% of female quantal changes in contact pattern. These results are in comparison with those obtained from the fo-based algorithm (Maxfield et al).
Collapse
|
155
|
Venezia JH, Martin AG, Hickok G, Richards VM. Identification of the Spectrotemporal Modulations That Support Speech Intelligibility in Hearing-Impaired and Normal-Hearing Listeners. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2019; 62:1051-1067. [PMID: 30986140 PMCID: PMC6802883 DOI: 10.1044/2018_jslhr-h-18-0045] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Purpose Age-related sensorineural hearing loss can dramatically affect speech recognition performance due to reduced audibility and suprathreshold distortion of spectrotemporal information. Normal aging produces changes within the central auditory system that impose further distortions. The goal of this study was to characterize the effects of aging and hearing loss on perceptual representations of speech. Method We asked whether speech intelligibility is supported by different patterns of spectrotemporal modulations (STMs) in older listeners compared to young normal-hearing listeners. We recruited 3 groups of participants: 20 older hearing-impaired (OHI) listeners, 19 age-matched normal-hearing listeners, and 10 young normal-hearing (YNH) listeners. Listeners performed a speech recognition task in which randomly selected regions of the speech STM spectrum were revealed from trial to trial. The overall amount of STM information was varied using an up-down staircase to hold performance at 50% correct. Ordinal regression was used to estimate weights showing which regions of the STM spectrum were associated with good performance (a "classification image" or CImg). Results The results indicated that (a) large-scale CImg patterns did not differ between the 3 groups; (b) weights in a small region of the CImg decreased systematically as hearing loss increased; (c) CImgs were also nonsystematically distorted in OHI listeners, and the magnitude of this distortion predicted speech recognition performance even after accounting for audibility; and (d) YNH listeners performed better overall than the older groups. Conclusion We conclude that OHI/older normal-hearing listeners rely on the same speech STMs as YNH listeners but encode this information less efficiently. Supplemental Material https://doi.org/10.23641/asha.7859981.
Collapse
|
156
|
Lee B, Jia Y, Mirbozorgi SA, Connolly M, Tong X, Zeng Z, Mahmoudi B, Ghovanloo M. An Inductively-Powered Wireless Neural Recording and Stimulation System for Freely-Behaving Animals. IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS 2019; 13:413-424. [PMID: 30624226 PMCID: PMC6510586 DOI: 10.1109/tbcas.2019.2891303] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
An inductively-powered wireless integrated neural recording and stimulation (WINeRS-8) system-on-a-chip (SoC) that is compatible with the EnerCage-HC2 for wireless/battery-less operation has been presented for neuroscience experiments on freely behaving animals. WINeRS-8 includes a 32-ch recording analog front end, a 4-ch current-controlled stimulator, and a 434 MHz on - off keying data link to an external software- defined radio wideband receiver (Rx). The headstage also has a bluetooth low energy link for controlling the SoC. WINeRS-8/EnerCage-HC2 systems form a bidirectional wireless and battery-less neural interface within a standard homecage, which can support longitudinal experiments in an enriched environment. Both systems were verified in vivo on rat animal model, and the recorded signals were compared with hardwired and battery-powered recording results. Realtime stimulation and recording verified the system's potential for bidirectional neural interfacing within the homecage, while continuously delivering 35 mW to the hybrid WINeRS-8 headstage over an unlimited period.
Collapse
|
157
|
Van Wert JC, Mensinger AF. Seasonal and Daily Patterns of the Mating Calls of the Oyster Toadfish, Opsanus tau. THE BIOLOGICAL BULLETIN 2019; 236:97-107. [PMID: 30933642 DOI: 10.1086/701754] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Acoustic communication is vital across many taxa for mating behavior, defense, and social interactions. Male oyster toadfish, Opsanus tau, produce courtship calls, or "boatwhistles," characterized by an initial broadband segment (30-50 ms) and a longer tone-like second part (200-650 ms) during mating season. Male calls were monitored continuously with an in situ SoundTrap hydrophone that was deployed in Eel Pond, Woods Hole, Massachusetts, during the 2015 mating season. At least 10 vocalizing males were positively identified by their unique acoustic signatures. This resident population was tracked throughout the season, with several individuals tracked for extended periods of time (72 hours). Toadfish began calling in mid-May when water temperature reached 14.6 °C with these early-season "precursor" boatwhistles that were shorter in duration and contained less distinct tonal segments compared to calls later in the season. The resident toadfish stopped calling in mid-August, when water temperature was about 25.5 °C. The pulse repetition rate of the tonal part of the call was significantly related to ambient water temperature during both short-term (hourly) and long-term (weekly) monitoring. This was the first study to monitor individuals in the same population of oyster toadfish in situ continuously throughout the mating season.
Collapse
|
158
|
Wang ZT, Akamatsu T, Nowacek DP, Yuan J, Zhou L, Lei PY, Li J, Duan PX, Wang KX, Wang D. Soundscape of an Indo-Pacific humpback dolphin (Sousa chinensis) hotspot before windfarm construction in the Pearl River Estuary, China: Do dolphin engage in noise avoidance and passive eavesdropping behavior? MARINE POLLUTION BULLETIN 2019; 140:509-522. [PMID: 30803672 DOI: 10.1016/j.marpolbul.2019.02.013] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2018] [Revised: 02/06/2019] [Accepted: 02/06/2019] [Indexed: 06/09/2023]
Abstract
Soundscapes are vital to acoustically specialized animals. Using passive acoustic monitoring data, the temporal and spectral variations in the soundscape of a Chinese white dolphin hotspot were analyzed. By cluster analysis, the 1/3 octave band power spectrum can be grouped into three bands with median overall contribution rates of 35.24, 14.14 and 30.61%. Significant diel and tidal soundscape variations were observed with a generalized linear model. Temporal patterns and frequency ranges of middle frequency band sound matched well with those of fish vocalization, indicating that fish might serve as a signal source. Dolphin sounds were mainly detected in periods involving low levels of ambient sound and without fish vocalization, which could reflect noise avoidance and passive eavesdropping behaviors engaged in by the predator. Pre-construction data can be used to assess the effects of offshore windfarms on acoustic environments and aquatic animals by comparing them with the soundscape of postconstruction and/or postmitigation.
Collapse
|
159
|
Abstract
Eight articulatory disordered children were studied to compare the occurrence of phonological processes using three elicitation methods, including single-word productions, imitated sentences, and continuous speech sampling. A total of 11 phonological processes were shown by the subjects, with only the process of gliding indicative of significantly different rates of occurrence among the three procedures. These subjects were relatively consistent in their use of phonological processes under different speaking conditions.
Collapse
|
160
|
Garellek M. Acoustic Discriminability of the Complex Phonation System in !Xóõ. PHONETICA 2019; 77:131-160. [PMID: 30739113 DOI: 10.1159/000494301] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 10/02/2018] [Indexed: 06/09/2023]
Abstract
Phonation types, or contrastive voice qualities, are minimally produced using complex movements of the vocal folds, but may additionally involve constriction in the supraglottal and pharyngeal cavities. These complex articulations in turn produce a multidimensional acoustic output that can be modeled in various ways. In this study, I investigate whether the psychoacoustic model of voice by Kreiman et al. (2014) succeeds at distinguishing six phonation types of !Xóõ. Linear discriminant analysis is performed using parameters from the model averaged over the entire vowel as well as for the first and final halves of the vowel. The results indicate very high classification accuracy for all phonation types. Measures averaged over the vowel's entire duration are closely correlated with the discriminant functions, suggesting that they are sufficient for distinguishing even dynamic phonation types. Measures from all classes of parameters are correlated with the linear discriminant functions; in particular, the "strident" vowels, which are harsh in quality, are characterized by their noise, changes in spectral tilt, decrease in voicing amplitude and frequency, and raising of the first formant. Despite the large number of contrasts and the time-varying characteristics of many of the phonation types, the phonation contrasts in !Xóõ remain well differentiated acoustically.
Collapse
|
161
|
Oikarinen T, Srinivasan K, Meisner O, Hyman JB, Parmar S, Fanucci-Kiss A, Desimone R, Landman R, Feng G. Deep convolutional network for animal sound classification and source attribution using dual audio recordings. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:654. [PMID: 30823820 PMCID: PMC6786887 DOI: 10.1121/1.5087827] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Revised: 12/28/2018] [Accepted: 01/02/2019] [Indexed: 06/09/2023]
Abstract
This paper introduces an end-to-end feedforward convolutional neural network that is able to reliably classify the source and type of animal calls in a noisy environment using two streams of audio data after being trained on a dataset of modest size and imperfect labels. The data consists of audio recordings from captive marmoset monkeys housed in pairs, with several other cages nearby. The network in this paper can classify both the call type and which animal made it with a single pass through a single network using raw spectrogram images as input. The network vastly increases data analysis capacity for researchers interested in studying marmoset vocalizations, and allows data collection in the home cage, in group housed animals.
Collapse
|
162
|
Shamma S, Dutta K. Spectro-temporal templates unify the pitch percepts of resolved and unresolved harmonics. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:615. [PMID: 30823787 PMCID: PMC6910008 DOI: 10.1121/1.5088504] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2018] [Revised: 12/07/2018] [Accepted: 01/09/2019] [Indexed: 06/09/2023]
Abstract
Pitch is a fundamental attribute in auditory perception involved in source identification and segregation, music, and speech understanding. Pitch percepts are intimately related to harmonic resolvability of sound. When harmonics are well-resolved, the induced pitch is usually salient and precise, and several models relying on autocorrelations or harmonic spectral templates can account for these percepts. However, when harmonics are not completely resolved, the pitch percept becomes less salient, poorly discriminated, with upper range limited to a few hundred hertz, and spectral templates fail to convey percept since only temporal cues are available. Here, a biologically-motivated model is presented that combines spectral and temporal cues to account for both percepts. The model explains how temporal analysis to estimate the pitch of the unresolved harmonics is performed by bandpass filters implemented by resonances in dendritic trees of neurons in the early auditory pathway. It is demonstrated that organizing and exploiting such dendritic tuning can occur spontaneously in response to white noise. This paper then shows how temporal cues of unresolved harmonics may be integrated with spectrally resolved harmonics, creating spectro-temporal harmonic templates for all pitch percepts. Finally, the model extends its account of monaural pitch percepts to pitches evoked by dichotic binaural stimuli.
Collapse
|
163
|
Bliefnick JM, Ryherd EE, Jackson R. Evaluating hospital soundscapes to improve patient experience. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:1117. [PMID: 30823810 DOI: 10.1121/1.5090493] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2018] [Accepted: 01/24/2019] [Indexed: 06/09/2023]
Abstract
Hospital soundscapes can be difficult environments to assess acoustically due to alarms, medical equipment, and the continuous activity within units. Routinely, patients perceive these soundscapes to be poor when rating their hospital experience on HCAHPS (Hospital Consumer Assessment of Healthcare Providers and Systems) surveys administered after discharge. In this study, five hospital units of widely varying HCAHPS "quietness" performance were analyzed. Sound pressure levels were measured in 15 patient rooms and 5 nursing stations over 24-h periods. HCAHPS "quietness of the hospital environment" patient survey data were correlated with measured acoustical data at a room-level, revealing acoustical metrics linked to patient perceptions of hospital soundscape conditions. Metrics found to be statistically correlated (p < 0.05) included the absolute LAMIN levels in patient rooms, which found significantly higher HCAHPS quietness scores in units with average LAMIN levels below 35 dBA, in addition to specific low frequency octave bands and occurrence rates. Many other standard acoustical metrics (such as LAEQ, LAMAX, LCPEAK, and LA90) were not found to be statistically correlated between measured acoustical data and HCAHPS quietness patient responses. Taken as a whole, this study provides insights into the potential relationships between hospital noise and patient satisfaction.
Collapse
|
164
|
Topper VY, Reilly MP, Wagner LM, Thompson LM, Gillette R, Crews D, Gore AC. Social and neuromolecular phenotypes are programmed by prenatal exposures to endocrine-disrupting chemicals. Mol Cell Endocrinol 2019; 479:133-146. [PMID: 30287398 PMCID: PMC6263824 DOI: 10.1016/j.mce.2018.09.010] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/06/2018] [Revised: 08/25/2018] [Accepted: 09/28/2018] [Indexed: 01/09/2023]
Abstract
Exposures to endocrine-disrupting chemicals (EDCs) affect the development of hormone-sensitive neural circuits, the proper organization of which are necessary for the manifestation of appropriate adult social and sexual behaviors. We examined whether prenatal exposure to polychlorinated biphenyls (PCBs), a family of ubiquitous industrial contaminants detectable in virtually all humans and wildlife, caused changes in sexually-dimorphic social interactions and communications, and profiled the underlying neuromolecular phenotype. Rats were treated with a PCB commercial mixture, Aroclor 1221 (A1221), estradiol benzoate (EB) as a positive control for estrogenic effects of A1221, or the vehicle (4% DMSO), on embryonic day (E) 16 and 18. In adult F1 offspring, we first conducted tests of ultrasonic vocalization (USV) calls in a sociosexual context as a measure of motivated communications. Numbers of certain USV call types were significantly increased by prenatal treatment with A1221 in males, and decreased by EB in females. In a test of sociosexual preference for a hormone-vs. a non-hormone-primed opposite sex conspecific, male (but not female) nose-touching with opposite-sex rats was significantly diminished by EDCs. Gene expression profiling was conducted in two brain regions that are part of the social decision-making network in the brain: the medial preoptic nucleus (MPN) and the ventromedial nucleus (VMN). In both regions, many more genes were affected by A1221 or EB in females than males. In female MPN, A1221 changed expression of steroid hormone receptor and neuropeptide genes (e.g., Ar, Esr1, Esr2, and Kiss1). In male MPN, only Per2 was affected by A1221. The VMN had a number of genes affected by EB compared to vehicle (females: Kiss1, Kiss1r, Pgr; males: Crh) but not A1221. These differences between EB and A1221 indicate that the mechanism of action of A1221 goes beyond estrogenic pathways. These data show sex-specific effects of prenatal PCBs on adult behaviors and the neuromolecular phenotype.
Collapse
|
165
|
Schellenberg M, Gick B. Microtonal Variation in Sung Cantonese. PHONETICA 2018; 77:83-106. [PMID: 30517947 PMCID: PMC9059676 DOI: 10.1159/000493755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2017] [Accepted: 09/11/2018] [Indexed: 06/09/2023]
Abstract
BACKGROUND/AIMS Both music and language impose constraints on fundamental frequency (F0) in sung music. Composers are known to set words of tone languages to music in a way that reflects tone height but fails to include tone contour. This study tests whether choral singers add linguistic tone contour information to an unfamiliar song by examining whether Cantonese singers make use of microtonal variation. METHODS 12 native Cantonese-speaking non-professional choral singers learned and sang a novel song in Cantonese which included a minimal set of the Cantonese tones to probe whether everyday singers add in missing contour information. RESULTS Cantonese singers add in a rising F0 contour of less than a semitone when singing syllables with lexical rising tones. This microtonal variation is not observed when singing in a lower register. CONCLUSION Cantonese singers use microtonal contours to reflect rising contours of Cantonese linguistic tones.
Collapse
|
166
|
Bolyanatz MA. Evidence for Incomplete Neutralization in Chilean Spanish. PHONETICA 2018; 77:107-130. [PMID: 30513527 DOI: 10.1159/000493393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2017] [Accepted: 08/29/2018] [Indexed: 06/09/2023]
Abstract
BACKGROUND/AIMS In Chilean Spanish, syllable- and word-final /s/ are frequently weakened to an [h]-like segment or completely deleted. In word-final position, /s/ serves as the plural morpheme, so its deletion renders a site for potential neutralization with singular items. Chilean scholars have previously described differences in the vowel preceding weakened or deleted /s/ distinguishing it from non-/s/-final words, but this putative incomplete neutralization has not yet been acoustically verified, nor have its conditioning factors been explored. The primary purpose of this study was to assess via phonetic analysis of spontaneous speech whether neutralization of final vowels in singular words and plural words in Chilean Spanish is indeed incomplete, as hypothesized by scholars during the 20th century. Additionally, these vowels were also compared to the vowels of monomorphemic /s/-final words in order to ensure that the attested singular-versus-plural differences were not simply indicative of closed syllable laxing processes. METHODS Vowels were extracted from the spontaneous speech of 20 Chilean Spanish speakers and acoustically analyzed via VoiceSauce. RESULTS The results revealed that final /a/ vowels of plural words were found to be breathier than singular vowels but less breathy than the final vowels of monomorphemic words, and that plural /o/ was significantly fronted. They also demonstrated increased breathiness on /e/ vowels closed by /s/, regardless of morphological status. CONCLUSION These results provide the first account of incomplete neutralization of plural vowel correlates in spontaneous speech in Chilean Spanish, and they offer evidence for closed syllable processes in this particular dialect, in alignment with an exemplar-theoretic approach.
Collapse
|
167
|
Arsenali B, van Dijk J, Ouweltjes O, den Brinker B, Pevernagie D, Krijn R, van Gilst M, Overeem S. Recurrent Neural Network for Classification of Snoring and Non-Snoring Sound Events. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2018; 2018:328-331. [PMID: 30440404 DOI: 10.1109/embc.2018.8512251] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Obstructive sleep apnea (OSA) is a disorder that affects up to 38% of the western population. It is characterized by repetitive episodes of partial or complete collapse of the upper airway during sleep. These episodes are almost always accompanied by loud snoring. Questionnaires such as STOP-BANG exploit snoring to screen for OSA. However, they are not quantitative and thus do not exploit its full potential. A method for automatic detection of snoring in whole-night recordings is required to enable its quantitative evaluation. In this study, we propose such a method. The centerpiece of the proposed method is a recurrent neural network for modeling of sequential data with variable length. Mel-frequency cepstral coefficients, which were extracted from snoring and non-snoring sound events, were used as inputs to the proposed network. A total of 20 subjects referred to clinical sleep recording were also recorded by a microphone that was placed 70 cm from the top end of the bed. These recordings were used to assess the performance of the proposed method. When it comes to the detection of snoring events, our results show that the proposed method has an accuracy of 95%, sensitivity of 92%, and specificity of 98%. In conclusion, our results suggest that the proposed method may improve the process of snoring detection and with that the process of OSA screening. Follow-up clinical studies are required to confirm this potential.
Collapse
|
168
|
De Vreese S, van der Schaar M, Weissenberger J, Erbs F, Kosecka M, Solé M, André M. Marine mammal acoustic detections in the Greenland and Barents Sea, 2013 - 2014 seasons. Sci Rep 2018; 8:16882. [PMID: 30442965 PMCID: PMC6237968 DOI: 10.1038/s41598-018-34624-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Accepted: 10/17/2018] [Indexed: 11/24/2022] Open
Abstract
While the Greenland and Barents Seas are known habitats for several cetacean and pinniped species there is a lack of long-term monitoring data in this rapidly changing environment. Moreover, little is known of the ambient soundscapes, and increasing off-shore anthropogenic activities can influence the ecosystem and marine life. Baseline acoustic data is needed to better assess current and future soundscape and ecosystem conditions. The analysis of a year of continuous data from three passive acoustic monitoring devices revealed species-dependent seasonal and spatial variation of a large variety of marine mammals in the Greenland and Barents Seas. Sampling rates were 39 and 78 kHz in the respective locations, and all systems were operational at a duty cycle of 2 min on, 30 min off. The research presents a description of cetacean and pinniped acoustic detections along with a variety of unknown low-frequency tonal sounds, and ambient sound level measurements that fall within the scope of the European Marine Strategy Framework (MSFD). The presented data shows the importance of monitoring Arctic underwater biodiversity for assessing the ecological changes under the scope of climate change.
Collapse
|
169
|
Cosyns M, Meulemans M, Vermeulen E, Busschots L, Corthals P, Van Borsel J. Measuring Articulation Rate: A Comparison of Two Methods. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2018; 61:2772-2778. [PMID: 30383150 DOI: 10.1044/2018_jslhr-s-17-0251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Accepted: 06/14/2018] [Indexed: 06/08/2023]
Abstract
PURPOSE Mean articulatory rate (MAR) is an alternative approach to measure articulation rate and is defined as the mean of 5 rate measures in minimally 10 to maximally 20 consecutive syllables in perceptually fluent speech without pauses. This study examined the validity of this approach. METHOD Reading and spontaneous speech samples were collected from 80 typically fluent adults ranging in age between 20 and 59 years. After orthographic transcription, all samples were subjected to an articulation rate analysis first using the prevailing "global" method, which takes into account the entire speech sample and involves manipulation of the speech sample, and then again applying the MAR method. Paired-samples t tests were conducted to compare global measurements to MAR measurements. RESULTS For both spontaneous speech and reading, a strong correlation was found between the 2 methods. However, for both speech tasks, the paired-samples t tests revealed a significant difference with MAR values being higher than the global method values. CONCLUSIONS The MAR method is a valid method to measure articulation rate. However, it cannot be used interchangeably with the prevailing global method. Further standardization of the MAR method is needed before general clinical use can be suggested.
Collapse
|
170
|
van de Ven M, Ernestus M. The role of segmental and durational cues in the processing of reduced words. LANGUAGE AND SPEECH 2018; 61:358-383. [PMID: 28870139 PMCID: PMC6099978 DOI: 10.1177/0023830917727774] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In natural conversations, words are generally shorter and they often lack segments. It is unclear to what extent such durational and segmental reductions affect word recognition. The present study investigates to what extent reduction in the initial syllable hinders word comprehension, which types of segments listeners mostly rely on, and whether listeners use word duration as a cue in word recognition. We conducted three experiments in Dutch, in which we adapted the gating paradigm to study the comprehension of spontaneously uttered conversational speech by aligning the gates with the edges of consonant clusters or vowels. Participants heard the context and some segmental and/or durational information from reduced target words with unstressed initial syllables. The initial syllable varied in its degree of reduction, and in half of the stimuli the vowel was not clearly present. Participants gave too short answers if they were only provided with durational information from the target words, which shows that listeners are unaware of the reductions that can occur in spontaneous speech. More importantly, listeners required fewer segments to recognize target words if the vowel in the initial syllable was absent. This result strongly suggests that this vowel hardly plays a role in word comprehension, and that its presence may even delay this process. More important are the consonants and the stressed vowel.
Collapse
|
171
|
Ananthakrishnan S, Krishnan A. Human frequency following responses to iterated rippled noise with positive and negative gain: Differential sensitivity to waveform envelope and temporal fine-structure. Hear Res 2018; 367:113-123. [PMID: 30096491 PMCID: PMC6130915 DOI: 10.1016/j.heares.2018.07.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Revised: 07/01/2018] [Accepted: 07/25/2018] [Indexed: 10/28/2022]
Abstract
The perceived pitch of iterated rippled noise (IRN) with negative gain (IRNn) is an octave lower than that of IRN with positive gain (IRNp). IRNp and IRNn have identical waveform envelopes (ENV), but differing stimulus waveform fine structure (TFS), which likely accounts for this perceived pitch difference. Here, we examine whether differences in the temporal pattern of phase-locked activity reflected in the human brainstem Frequency Following Response (FFR) elicited by IRNp and IRNn can account for the differences in perceived pitch for the two stimuli. FFRs using a single onset polarity were measured in 13 normal-hearing, adult listeners in response to IRNp and IRNn stimuli with 2 ms, and 4 ms delay. Autocorrelation functions (ACFs) and Fast Fourier Transforms (FFTs) were used to evaluate the dominant periodicity and spectral pattern (harmonic spacing) in the phase-locked FFR neural activity. For both delays, the harmonic spacing in the spectra corresponded more strongly with the perceived lowering of pitch from IRNp to IRNn, compared to the ACFs. These results suggest that the FFR elicited by a single polarity stimulus reflects phase-locking to both stimulus ENV and TFS. A post-hoc experiment evaluating the FFR phase-locked activity to ENV (FFRENV), and TFS (FFRTFS) elicited by IRNp and IRNn confirmed that only the phase-locked activity to the TFS, reflected in FFRTFS, showed differences in both spectra and ACF that closely matched the pitch difference between the two stimuli. The results of the post-hoc experiment suggests that pitch-relevant information is preserved in the temporal pattern of phase-locked activity and suggests that the differences in stimulus ENV and TFS driving the pitch percept of IRNp and IRNn are preserved in the brainstem neural response. The scalp recorded FFR may provide for a noninvasive analytic tool to evaluate the relative contributions of envelope and temporal fine-structure in the neural representation of complex sounds in humans.
Collapse
|
172
|
Fishbein AR, Löschner J, Mallon JM, Wilkinson GS. Dynamic sex-specific responses to synthetic songs in a duetting suboscine passerine. PLoS One 2018; 13:e0202353. [PMID: 30157227 PMCID: PMC6114868 DOI: 10.1371/journal.pone.0202353] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2018] [Accepted: 08/01/2018] [Indexed: 11/30/2022] Open
Abstract
Many bird species produce temporally coordinated duets and choruses, requiring the rapid integration of auditory perception and motor production. While males and females of some species are known to participate in these displays for sex-specific purposes, few studies have identified perceptual features that trigger sex-specific contributions of coordinated song. Especially little is known about perception and production in duetting suboscine passerines, which are thought to have innate songs and largely static, rather than dynamic, vocal behavior. Here, we used synthetic stimuli in a playback experiment on chestnut-backed antbirds (Myrmeciza exsul) to (1) test whether differences in song frequency (Hz) can trigger sex-specific vocal behavior in a suboscine passerine (2) test for the functions of duetting in males and females of this species, and (3) determine whether these suboscines can dynamically adjust the temporal and spectral features of their songs. We found sex-specific responses to synthetic playback manipulated in song frequency (Hz), providing evidence that in this context males sing in duets for general territory defense and females join in for mate guarding purposes. In addition, we found that the birds altered the frequency, duration, and timing of their songs depending on the frequency of the playback songs. Thus, we show that these birds integrate spectral and temporal information about conspecific songs and actively modulate their responses in sex-specific ways.
Collapse
|
173
|
Cafaro V, Piazzolla D, Melchiorri C, Burgio C, Fersini G, Conversano F, Piermattei V, Marcelli M. Underwater noise assessment outside harbor areas: The case of Port of Civitavecchia, northern Tyrrhenian Sea, Italy. MARINE POLLUTION BULLETIN 2018; 133:865-871. [PMID: 30041388 DOI: 10.1016/j.marpolbul.2018.06.058] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/09/2018] [Revised: 06/21/2018] [Accepted: 06/23/2018] [Indexed: 06/08/2023]
Abstract
Underwater noise assessment is particularly important in coastal areas where a wide range of natural and anthropogenic sounds generate complex and variable soundscapes. In the last century, the number and size of noise sources has increased significantly, thereby increasing the ocean's background noise. Shipping is the main source of lower-frequency underwater noises (<500 Hz). This research aimed to provide an initial assessment of underwater noise levels in a coastal area of the northern Tyrrhenian Sea (Italy) using short-term recordings. Spatial and temporal variations in the noise level, and the type and number of ships sailing through the port were recorded. A significant correlation was found between ferry boats and sound pressure levels, indicating their role as a prevalent source of low frequency underwater noise in the project area. This research could provide the baseline for implementation of distribution and point-source underwater noise models that are required for sustainable coastal management.
Collapse
|
174
|
Seidl A, Cristia A, Soderstrom M, Ko ES, Abel EA, Kellerman A, Schwichtenberg AJ. Infant-Mother Acoustic-Prosodic Alignment and Developmental Risk. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2018; 61:1369-1380. [PMID: 29801160 PMCID: PMC6195085 DOI: 10.1044/2018_jslhr-s-17-0287] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2017] [Accepted: 02/12/2018] [Indexed: 05/16/2023]
Abstract
PURPOSE One promising early marker for autism and other communicative and language disorders is early infant speech production. Here we used daylong recordings of high- and low-risk infant-mother dyads to examine whether acoustic-prosodic alignment as well as two automated measures of infant vocalization are related to developmental risk status indexed via familial risk and developmental progress at 36 months of age. METHOD Automated analyses of the acoustics of daylong real-world interactions were used to examine whether pitch characteristics of one vocalization by the mother or the child predicted those of the vocalization response by the other speaker and whether other features of infants' speech in daylong recordings were associated with developmental risk status or outcomes. RESULTS Low-risk and high-risk dyads did not differ in the level of acoustic-prosodic alignment, which was overall not significant. Further analyses revealed that acoustic-prosodic alignment did not predict infants' later developmental progress, which was, however, associated with two automated measures of infant vocalizations (daily vocalizations and conversational turns). CONCLUSIONS Although further research is needed, these findings suggest that automated measures of vocalizations drawn from daylong recordings are a possible early identification tool for later developmental progress/concerns. SUPPLEMENTAL MATERIAL https://osf.io/cdn3v/.
Collapse
|
175
|
Thakur A, Abrol V, Sharma P, Rajan P. Local compressed convex spectral embedding for bird species identification. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:3819. [PMID: 29960469 DOI: 10.1121/1.5042241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper proposes a multi-layer alternating sparse-dense framework for bird species identification. The framework takes audio recordings of bird vocalizations and produces compressed convex spectral embeddings (CCSE). Temporal and frequency modulations in bird vocalizations are ensnared by concatenating frames of the spectrogram, resulting in a high dimensional and highly sparse super-frame-based representation. Random projections are then used to compress these super-frames. Class-specific archetypal analysis is employed on the compressed super-frames for acoustic modeling, obtaining the convex-sparse CCSE representation. This representation efficiently captures species-specific discriminative information. However, many bird species exhibit high intra-species variations in their vocalizations, making it hard to appropriately model the whole repertoire of vocalizations using only one dictionary of archetypes. To overcome this, each class is clustered using Gaussian mixture models (GMM), and for each cluster, one dictionary of archetypes is learned. To calculate CCSE for any compressed super-frame, one dictionary from each class is chosen using the responsibilities of individual GMM components. The CCSE obtained using this GMM-archetypal analysis framework is referred to as local CCSE. Experimental results corroborate that local CCSE either outperforms or exhibits comparable performances to existing methods including support vector machine powered by dynamic kernels and deep neural networks.
Collapse
|
176
|
Schlesinger JJ, Baum Miller SH, Nash K, Bruce M, Ashmead D, Shotwell MS, Edworthy JR, Wallace MT, Weinger MB. Acoustic features of auditory medical alarms-An experimental study of alarm volume. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:3688. [PMID: 29960450 PMCID: PMC6910025 DOI: 10.1121/1.5043396] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/12/2018] [Revised: 06/04/2018] [Accepted: 06/04/2018] [Indexed: 05/20/2023]
Abstract
Audible alarms are a ubiquitous feature of all high-paced, high-risk domains such as aviation and nuclear power where operators control complex systems. In such settings, a missed alarm can have disastrous consequences. It is conventional wisdom that for alarms to be heard, "louder is better," so that alarm levels in operational environments routinely exceed ambient noise levels. Through a robust experimental paradigm in an anechoic environment to study human response to audible alerting stimuli in a cognitively demanding setting, akin to high-tempo and high-risk domains, clinician participants responded to patient crises while concurrently completing an auditory speech intelligibility and visual vigilance distracting task as the level of alarms were varied as a signal-to-noise ratio above and below hospital background noise. There was little difference in performance on the primary task when the alarm sound was -11 dB below background noise as compared with +4 dB above background noise-a typical real-world situation. Concurrent presentation of the secondary auditory speech intelligibility task significantly degraded performance. Operator performance can be maintained with alarms that are softer than background noise. These findings have widespread implications for the design and implementation of alarms across all high-consequence settings.
Collapse
|
177
|
Xenaki A, Bünsow Boldt J, Græsbøll Christensen M. Sound source localization and speech enhancement with sparse Bayesian learning beamforming. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:3912. [PMID: 29960460 DOI: 10.1121/1.5042222] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Speech localization and enhancement involves sound source mapping and reconstruction from noisy recordings of speech mixtures with microphone arrays. Conventional beamforming methods suffer from low resolution, especially with a limited number of microphones. In practice, there are only a few sources compared to the possible directions-of-arrival (DOA). Hence, DOA estimation is formulated as a sparse signal reconstruction problem and solved with sparse Bayesian learning (SBL). SBL uses a hierarchical two-level Bayesian inference to reconstruct sparse estimates from a small set of observations. The first level derives the posterior probability of the complex source amplitudes from the data likelihood and the prior. The second level tunes the prior towards sparse solutions with hyperparameters which maximize the evidence, i.e., the data probability. The adaptive learning of the hyperparameters from the data auto-regularizes the inference problem towards sparse robust estimates. Simulations and experimental data demonstrate that SBL beamforming provides high-resolution DOA maps outperforming traditional methods especially for correlated or non-stationary signals. Specifically for speech signals, the high-resolution SBL reconstruction offers not only speech enhancement but effectively speech separation.
Collapse
|
178
|
Kastelein RA, Helder-Hoek L, Kommeren A, Covi J, Gransier R. Effect of pile-driving sounds on harbor seal (Phoca vitulina) hearing. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:3583. [PMID: 29960448 DOI: 10.1121/1.5040493] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Seals exposed to intense sounds may suffer hearing loss. After exposure to playbacks of broadband pile-driving sounds, the temporary hearing threshold shift (TTS) of two harbor seals was quantified at 4 and 8 kHz (frequencies of the highest TTS) with a psychoacoustic technique. The pile-driving sounds had: a 127 ms pulse duration, 2760 strikes per h, a 1.3 s inter-pulse interval, a ∼9.5% duty cycle, and an average received single-strike unweighted sound exposure level (SELss) of 151 dB re 1 μPa2s. Exposure durations were 180 and 360 min [cumulative sound exposure level (SELcum): 190 and 193 dB re 1 μPa2s]. Control sessions were conducted under low ambient noise. TTS only occurred after 360 min exposures (mean TTS: seal 02, 1-4 min after sound stopped: 3.9 dB at 4 kHz and 2.4 dB at 8 kHz; seal 01, 12-16 min after sound stopped: 2.8 dB at 4 kHz and 2.6 dB at 8 kHz). Hearing recovered within 60 min post-exposure. The TTSs were small, due to the small amount of sound energy to which the seals were exposed. Biological TTS onset SELcum for the pile-driving sounds used in this study is around 192 dB re 1 μPa2s (for mean received SELss of 151 dB re 1 μPa and a duty cycle of ∼9.5%).
Collapse
|
179
|
Zaunschirm M, Schörkhuber C, Höldrich R. Binaural rendering of Ambisonic signals by head-related impulse response time alignment and a diffuseness constraint. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:3616. [PMID: 29960468 DOI: 10.1121/1.5040489] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Binaural rendering of Ambisonic signals is of great interest in the fields of virtual reality, immersive media, and virtual acoustics. Typically, the spatial order of head-related impulse responses (HRIRs) is considerably higher than the order of the Ambisonic signals. The resulting order reduction of the HRIRs has a detrimental effect on the binaurally rendered signals, and perceptual evaluations indicate limited externalization, localization accuracy, and altered timbre. In this contribution, a binaural renderer, which is computed using a frequency-dependent time alignment of HRIRs followed by a minimization of the squared error subject to a diffuse-field covariance matrix constraint, is presented. The frequency-dependent time alignment retains the interaural time difference (at low frequencies) and results in a HRIR set with lower spatial complexity, while the constrained optimization controls the diffuse-field behavior. Technical evaluations in terms of sound coloration, interaural level differences, diffuse-field response, and interaural coherence, as well as findings from formal listening experiments show a significant improvement of the proposed method compared to state-of-the-art methods.
Collapse
|
180
|
Fan P, Liu X, Liu R, Li F, Huang T, Wu F, Yao H, Liu D. Vocal repertoire of free-ranging adult golden snub-nosed monkeys (Rhinopithecus roxellana). Am J Primatol 2018; 80:e22869. [PMID: 29767431 PMCID: PMC6032912 DOI: 10.1002/ajp.22869] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2017] [Revised: 04/03/2018] [Accepted: 04/20/2018] [Indexed: 11/09/2022]
Abstract
Vocal signaling represents a primary mode of communication for most nonhuman primates. A quantitative description of the vocal repertoire is a critical step in in-depth studies of the vocal communication of particular species, and provides the foundation for comparative studies to investigate the selective pressures in the evolution of vocal communication systems. The present study was the first attempt to establish the vocal repertoire of free-ranging adult golden snub-nosed monkeys (Rhinopithecus roxellana) based on quantitative methods. During 8 months in Shennongjia National Park, China, we digitally recorded the vocalizations of adult individuals from a provisioned, free-ranging group of R. roxellana across a variety of social-ecological contexts. We identified 18 call types, which were easily distinguishable by ear, visual inspection of spectrograms, and quantitative analysis of acoustic parameters measured from recording samples. We found a great sexual asymmetry in the vocal repertoire size (females produced many more call types than males), likely due to the sex differences in body size and social role. We found a variety of call types that occurred during various forms of agonistic and affiliative interactions at close range. We made inference about the functions of particular call types based on the contexts in which they were produced. Studies on the vocal communication in R. roxellana are particularly valuable since they provide a case about how nonhuman primates, inhabiting forest habitats and forming complex social systems, use their vocalizations to interact with their social and ecological environments.
Collapse
|
181
|
Vikram CM, Macha SK, Kalita S, Mahadeva Prasanna SR. Acoustic analysis of misarticulated trills in cleft lip and palate children. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:EL474. [PMID: 29960457 DOI: 10.1121/1.5042339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, acoustic analysis of misarticulated trills in cleft lip and palate speakers is carried out using excitation source based features: strength of excitation and fundamental frequency, derived from zero-frequency filtered signal, and vocal tract system features: first formant frequency (F1) and trill frequency, derived from the linear prediction analysis and autocorrelation approach, respectively. These features are found to be statistically significant while discriminating normal from misarticulated trills. Using acoustic features, dynamic time warping based trill misarticulation detection system is demonstrated. The performance of the proposed system in terms of the F1-score is 73.44%, whereas that for conventional Mel-frequency cepstral coefficients is 66.11%.
Collapse
|
182
|
Peter V, Kalashnikova M, Burnham D. Weighting of Amplitude and Formant Rise Time Cues by School-Aged Children: A Mismatch Negativity Study. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2018; 61:1322-1333. [PMID: 29800360 DOI: 10.1044/2018_jslhr-h-17-0334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Accepted: 02/07/2018] [Indexed: 06/08/2023]
Abstract
PURPOSE An important skill in the development of speech perception is to apply optimal weights to acoustic cues so that phonemic information is recovered from speech with minimum effort. Here, we investigated the development of acoustic cue weighting of amplitude rise time (ART) and formant rise time (FRT) cues in children as measured by mismatch negativity (MMN). METHOD Twelve adults and 36 children aged 6-12 years listened to a /ba/-/wa/ contrast in an oddball paradigm in which the standard stimulus had the ART and FRT cues of /ba/. In different blocks, the deviant stimulus had either the ART or FRT cues of /wa/. RESULTS The results revealed that children younger than 10 years were sensitive to both ART and FRT cues whereas 10- to 12-year-old children and adults were sensitive only to FRT cues. Moreover, children younger than 10 years generated a positive mismatch response, whereas older children and adults generated MMN. CONCLUSION These results suggest that preattentive adultlike weighting of ART and FRT cues is attained only by 10 years of age and accompanies the change from mismatch response to the more mature MMN response. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.6207608.
Collapse
|
183
|
Escera C, López-Caballero F, Gorina-Careta N. The Potential Effect of Forbrain as an Altered Auditory Feedback Device. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2018; 61:801-810. [PMID: 29554188 DOI: 10.1044/2017_jslhr-s-17-0072] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Accepted: 11/10/2017] [Indexed: 06/08/2023]
Abstract
PURPOSE The purpose of this study was to run a proof of concept on a new commercially available device, Forbrain® (Sound For Life Ltd/Soundev, Luxemburg, model UN38.3), to test whether it can modulate the speech of its users. METHOD Participants were instructed to read aloud a text of their choice during 3 experimental phases: baseline, test, and posttest, while wearing a Forbrain® headset. Critically, for half of the participants (Forbrain group), the device was turned on during the test phase, whereas for the other half (control group), the device was kept off. Voice recordings were analyzed to derive 6 quantitative measures of voice quality over each of the phases of the experiment. RESULTS A significant Group × Phase interaction was obtained for the smoothed cepstral peak prominence, a measure of voice harmony, and for the trendline of the long-term average spectrum, a measure of voice robustness, this latter surviving Bonferroni correction for multiple comparisons. CONCLUSIONS The results of this study indicate the effectiveness of Forbrain® in modifying the speech of its users. It is suggested that Forbrain® works as an altered auditory feedback device. It may hence be used as a clinical device in speech therapy clinics, yet further studies are warranted to test its usefulness in clinical groups.
Collapse
|
184
|
Jones CA, Duffy MK, Hoffman SA, Schultz-Darken NJ, Braun KM, Ciucci MR, Emborg ME. Vocalization development in common marmosets for neurodegenerative translational modeling. Neurol Res 2018; 40:303-311. [PMID: 29457539 PMCID: PMC6083835 DOI: 10.1080/01616412.2018.1438226] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2017] [Accepted: 02/03/2018] [Indexed: 10/18/2022]
Abstract
Objectives In order to facilitate the study of vocalizations in emerging genetic common marmoset models of neurodegenerative disorders, we aimed to analyze call-type changes across age in a translational research environment. We hypothesized that acoustic parameters of vocalizations would change with age, reflecting growth of the vocal apparatus and a maturation of control needed to make adult-like calls. Methods Nineteen developing common marmosets were longitudinally video- and audio-recorded between the ages of 1-149 days in a naturalistic setting without any vocalization elicitation protocol. Vocalizations were coded for call type (cry, tsik, trill, phee, and trill-phee) and analyzed for duration (sec), minimum and maximum frequency (Hz), and bandwidth (Hz). Mixed model linear regressions were performed to assess the effects of age on call parameters listed above for each call type. Results Cries decreased in duration (P = 0.038), maximum frequency (P = 0.047), and bandwidth (P = 0.023) with age. Tsik calls decreased in duration (P = 0.002) and increased in minimum frequency (P = 0.004) and maximum frequency (P = 0.005) with age. Trill calls increased in duration (P = 0.003), and trillphee bandwidth (P = 0.031) decreased with age. Discussion Our results demonstrate that development of common marmoset vocalizations is call type dependent and that changes in acoustic parameters can be detected without complex vocalization elicitation paradigms or specialized audio recording equipment. Thus, we demonstrate the feasibility of a naturalistic protocol to collect and objectively analyze marmoset vocalizations longitudinally. This approach may be useful for studying vocal communication deficits in genetic models of neurodegenerative disorders.
Collapse
|
185
|
Postma BNJ, Jouan S, Katz BFG. Pre-Sabine room acoustic design guidelines based on human voice directivity. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:2428. [PMID: 29716287 DOI: 10.1121/1.5032201] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/08/2018] [Accepted: 03/30/2018] [Indexed: 06/08/2023]
Abstract
With the work of Wallace C. Sabine on the lecture hall of the Fogg Art Museum and concert hall of Boston Symphony Hall, a foundation for the field of architectural acoustics as a science was laid between 1895 and 1900. Prior to that, architects employed various notions in acoustic design. Previous studies by the authors have reviewed 18th and 19th century design guidelines that were based on the quantification of the perception threshold between direct sound and first order reflections, with these guidelines being followed in the design of several rooms with acoustical demands. This study reviews an alternate metric guideline, based on the directivity and propagation distance of the human voice, which was utilized in several halls also during the 18th and 19th centuries. The related acoustic experiments tested how far sound was perceivable towards the front, sides, and rear of a speaking person. These ratios were used in the acoustical design of at least five lecture halls, four theater halls, one opera hall, and one concert hall, constructed in Germany, England, and the USA. These historic designs, and comparisons to modern measures and guidelines, are reviewed.
Collapse
|
186
|
Gridley T, Silva MFP, Wilkinson C, Seakamela SM, Elwen SH. Song recorded near a super-group of humpback whales on a mid-latitude feeding ground off South Africa. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:EL298. [PMID: 29716258 DOI: 10.1121/1.5032126] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Humpback whales (Megaptera novaeangliae) are well known for their complex song which is culturally transmitted and produced by males. However, the function of singing behavior remains poorly understood. Song was observed from 57 min of acoustic recording in the presence of feeding humpback whales aggregated in the near-shore waters on the west coast of South Africa. The structural organization of the song components, lack of overlap between song units, and consistency in relative received level suggest the song was produced by one "singer." The unusual timing and location of song production adds further evidence of plasticity in song production.
Collapse
|
187
|
Andrès E, Gass R, Charloux A, Brandt C, Hentzler A. Respiratory sound analysis in the era of evidence-based medicine and the world of medicine 2.0. J Med Life 2018; 11:89-106. [PMID: 30140315 PMCID: PMC6101681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Accepted: 04/10/2018] [Indexed: 12/03/2022] Open
Abstract
OBJECTIVE This paper describes the state of the art, scientific publications, and ongoing research related to the methods of analysis of respiratory sounds. METHODS AND MATERIAL Narrative review of the current medical and technological literature using Pubmed and personal experience. RESULTS We outline the various techniques that are currently being used to collect auscultation sounds and provide a physical description of known pathological sounds for which automatic detection tools have been developed. Modern tools are based on artificial intelligence and techniques such as artificial neural networks, fuzzy systems, and genetic algorithms. CONCLUSION The next step will consist of finding new markers to increase the efficiency of decision-aiding algorithms and tools.
Collapse
|
188
|
Allison KM, Hustad KC. Acoustic Predictors of Pediatric Dysarthria in Cerebral Palsy. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2018; 61:462-478. [PMID: 29466556 PMCID: PMC5963041 DOI: 10.1044/2017_jslhr-s-16-0414] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2016] [Revised: 05/18/2017] [Accepted: 10/20/2017] [Indexed: 05/19/2023]
Abstract
PURPOSE The objectives of this study were to identify acoustic characteristics of connected speech that differentiate children with dysarthria secondary to cerebral palsy (CP) from typically developing children and to identify acoustic measures that best detect dysarthria in children with CP. METHOD Twenty 5-year-old children with dysarthria secondary to CP were compared to 20 age- and sex-matched typically developing children on 5 acoustic measures of connected speech. A logistic regression approach was used to derive an acoustic model that best predicted dysarthria status. RESULTS Results indicated that children with dysarthria secondary to CP differed from typically developing children on measures of multiple segmental and suprasegmental speech characteristics. An acoustic model containing articulation rate and the F2 range of diphthongs differentiated children with dysarthria from typically developing children with 87.5% accuracy. CONCLUSION This study serves as a first step toward developing an acoustic model that can be used to improve early identification of dysarthria in children with CP.
Collapse
|
189
|
Linn SN, Boeer M, Scheumann M. First insights into the vocal repertoire of infant and juvenile Southern white rhinoceros. PLoS One 2018; 13:e0192166. [PMID: 29513670 PMCID: PMC5841651 DOI: 10.1371/journal.pone.0192166] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2017] [Accepted: 01/17/2018] [Indexed: 11/22/2022] Open
Abstract
Describing vocal repertoires represents an essential step towards gaining an overview about the complexity of acoustic communication in a given species. The analysis of infant vocalisations is essential for understanding the development and usage of species-specific vocalisations, but is often underrepresented, especially in species with long inter-birth intervals such as the white rhinoceros. Thus, this study aimed for the first time to characterise the infant and juvenile vocal repertoire of the Southern white rhinoceros and to relate these findings to the adult vocal repertoire. The behaviour of seven mother-reared white rhinoceros calves (two males, five females) and one hand-reared calf (male), ranging from one month to four years, was simultaneously audio and video-taped at three zoos. Normally reared infants and juveniles uttered four discriminable call types (Whine, Snort, Threat, and Pant) that were produced in different behavioural contexts. All call types were also uttered by the hand-reared calf. Call rates of Whines, but not of the other call types, decreased with age. These findings provide the first evidence that infant and juvenile rhinoceros utter specific call types in distinct contexts, even if they grow up with limited social interaction with conspecifics. By comparing our findings with the current literature on vocalisations of adult white rhinoceros and other solitary rhinoceros species, we discuss to which extent differences in the social lifestyle across species affect acoustic communication in mammals.
Collapse
|
190
|
Le LN, Jones DL. Tensorial dynamic time warping with articulation index representation for efficient audio-template learning. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:1548. [PMID: 29604702 DOI: 10.1121/1.5027245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Audio classification techniques often depend on the availability of a large labeled training dataset for successful performance. However, in many application domains of audio classification (e.g., wildlife monitoring), obtaining labeled data is still a costly and laborious process. Motivated by this observation, a technique is proposed to efficiently learn a clean template from a few labeled, but likely corrupted (by noise and interferences), data samples. This learning can be done efficiently via tensorial dynamic time warping on the articulation index-based time-frequency representations of audio data. The learned template can then be used in audio classification following the standard template-based approach. Experimental results show that the proposed approach outperforms both (1) the recurrent neural network approach and (2) the state-of-the-art in the template-based approach on a wildlife detection application with few training samples.
Collapse
|
191
|
Yang J, Qian J, Chen X, Kuehnel V, Rehmann J, von Buol A, Li Y, Ren C, Liu B, Xu L. Effects of nonlinear frequency compression on the acoustic properties and recognition of speech sounds in Mandarin Chinese. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:1578. [PMID: 29604675 DOI: 10.1121/1.5027404] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The present study examined the change in spectral properties of Mandarin vowels and fricatives caused by nonlinear frequency compression (NLFC) used in hearing instruments and how these changes affect the perception of speech sounds in normal-hearing listeners. Speech materials, including a list of Mandarin monosyllables in the form of /dV/ (12 vowels) and /Ca/ (five fricatives), were recorded from 20 normal-hearing, native Mandarin-speaking adults (ten males and ten females). NLFC was based on Phonak SoundRecover algorithms. The speech materials were processed with six different NLFC parameter settings. Detailed acoustic analysis revealed that the high front vowel /i/ and certain compound vowels containing /i/ demonstrated positional deviation in certain processed conditions in comparison to the unprocessed condition. All five fricatives showed acoustic changes in spectral features in all processed conditions. Fourteen Mandarin-speaking, normal-hearing adult listeners performed phoneme recognition with the six NLFC processing conditions. When the cut-off frequency was set relatively low, recognition of /s/ was detrimentally affected, whereas none of the NLFC processing configurations affected the other phonemes. The discrepancy between the considerable acoustic changes and the negligible adverse effects on perceptual outcomes is partially accounted for by the phonology system and phonotactic constraints in Mandarin.
Collapse
|
192
|
O'Reilly C, Analuddin K, Kelly DJ, Harte N. Measuring vocal difference in bird population pairs. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:1658. [PMID: 29604681 DOI: 10.1121/1.5027244] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Over time, a bird population's acoustic and morphological features can diverge from the parent species. A quantitative measure of difference between two populations of species/subspecies is extremely useful to zoologists. Work in this paper takes a dialect difference system first developed for speech and refines it to automatically measure vocalisation difference between bird populations by extracting pitch contours. The pitch contours are transposed into pitch codes. A variety of codebook schemes are proposed to represent the contour structure, including a vector quantization approach. The measure, called Bird Vocalisation Difference, is applied to bird populations with calls that are considered very similar, very different, and between these two extremes. Initial results are very promising, with the behaviour of the metric consistent with accepted levels of similarity for the populations tested to date. The influence of data size on the measure is investigated by using reduced datasets. Results of species pair classification using Gaussian mixture models with Mel-frequency cepstral coefficients is also given as a baseline indicator of class confusability.
Collapse
|
193
|
Yu JF, Hsien HC, Lee KC, Xiao JH, Chiu HH, Hong HH, Shen YF, Peng YC. Effect of mouth-opening levels on sound field gain in the ear canal. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:1451. [PMID: 29604713 DOI: 10.1121/1.5026692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The human external auditory canal can become deformed when the mandible moves, and this changes the sound field in the external auditory canal. This study measured the sound field gain in the external auditory canal while varying mouth-opening in three levels. The mandible was fixed at the 1/3, the 2/3, and the maximal mouth-opening levels. Seven 65-dB tones of 200, 500, 1000, 2000, 4000, 6000, and 8000 Hz, which are the sound pressure level and frequency range when people are talking at a normal level, were adopted as the sound stimulus to measure sound field gains at 5, 10, 15, and 20 mm to the interior of the external auditory canal. The results show that, with the exception of the 1.25 dB decrease from 12.96 to 11.71 dB at a depth of 5 mm with a stimulus at 8000 Hz, the differences in the sound field gain at the other depths and stimulus frequencies were within 1 dB and were not statistically significant. These results suggest that mouth-opening level has no effect on the measurement of the sound field in the external auditory canal.
Collapse
|
194
|
Maruthy S, Feng Y, Max L. Spectral Coefficient Analyses of Word-Initial Stop Consonant Productions Suggest Similar Anticipatory Coarticulation for Stuttering and Nonstuttering Adults. LANGUAGE AND SPEECH 2018; 61:31-42. [PMID: 29280401 PMCID: PMC5747557 DOI: 10.1177/0023830917695853] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
A longstanding hypothesis about the sensorimotor mechanisms underlying stuttering suggests that stuttered speech dysfluencies result from a lack of coarticulation. Formant-based measures of either the stuttered or fluent speech of children and adults who stutter have generally failed to obtain compelling evidence in support of the hypothesis that these individuals differ in the timing or degree of coarticulation. Here, we used a sensitive acoustic technique-spectral coefficient analyses-that allowed us to compare stuttering and nonstuttering speakers with regard to vowel-dependent anticipatory influences as early as the onset burst of a preceding voiceless stop consonant. Eight adults who stutter and eight matched adults who do not stutter produced C1VC2 words, and the first four spectral coefficients were calculated for one analysis window centered on the burst of C1 and two subsequent windows covering the beginning of the aspiration phase. Findings confirmed that the combined use of four spectral coefficients is an effective method for detecting the anticipatory influence of a vowel on the initial burst of a preceding voiceless stop consonant. However, the observed patterns of anticipatory coarticulation showed no statistically significant differences, or trends toward such differences, between the stuttering and nonstuttering groups. Combining the present results for fluent speech in one given phonetic context with prior findings from both stuttered and fluent speech in a variety of other contexts, we conclude that there is currently no support for the hypothesis that the fluent speech of individuals who stutter is characterized by limited coarticulation.
Collapse
|
195
|
Senan TU, Jelfs S, Kohlrausch A. Cognitive disruption by noise-vocoded speech stimuli: Effects of spectral variation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:1407. [PMID: 29604682 DOI: 10.1121/1.5026619] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The effect of irrelevant sounds on short-term memory was investigated in two experiments using noise-vocoded speech stimuli (NVSS). Speech samples were systematically modified by a noise-vocoder and a set of stimuli varying from amplitude-modulated white noise to intelligible speech was created. Eight NVSS conditions, composed of 1-, 2-, 4-, 6-, 9-, 12-, 15-, and 18-bands, were used as the distracting stimuli in a digit-recall task next to the speech and silence conditions. The results showed that performance decreased with the number of frequency bands up to the 6-bands condition, but there was no influence of number of bands on performance beyond six bands. The results were analyzed using four acoustic metrics proposed in the literature: the frequency domain correlation coefficient (FDCC), the fluctuation strength, the speech transmission index (STI), and the normalized covariance measure (NCM). None of the metrics successfully predicted the results. However, the parameter values of the FDCC, the STI, and the NCM indicated that a prediction model for irrelevant sound effect should account for both temporal and spectral features of the irrelevant sounds.
Collapse
|
196
|
Spinu L. Investigating the status of a rare cross-linguistic contrast: The case of Romanian palatalized postalveolars. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:1235. [PMID: 29604669 DOI: 10.1121/1.5024350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This study examines a rare cross-linguistic contrast, that between plain and secondarily palatalized postalveolar fricatives, through (i) an acoustic analysis of the production of 31 Romanian speakers, and (ii) a perception experiment with a different group of 31 native speakers. Evidence of acoustic separation between plain and palatalized forms was found for 27 of the subjects, suggesting that the contrast is produced by the majority. This is consistent with previous reports of native speakers collected in 1961. These findings were supported by the results of the perceptual experiment, which showed that native speakers exhibit moderate sensitivity to this contrast. An examination of each of the two genders' production separately suggests that a process of neutralization may be in progress, more strongly realized by males compared to females. Aside from documenting this phenomenon in Romanian, an explanation is sought for its longevity, and it is proposed that grammatical restructuring offers the best account for the observed facts.
Collapse
|
197
|
Liker M, Gibbon FE. Tongue-Palate Contact Timing during /s/ and /z/ in English. PHONETICA 2018; 75:110-131. [PMID: 29433122 DOI: 10.1159/000479880] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2016] [Accepted: 06/28/2017] [Indexed: 06/08/2023]
Abstract
Although numerous studies have investigated supraglottal strategies for signalling voicing in fricatives, there is still no agreement about the precise characteristics of tongue-to-palate contact timing during voiced as opposed to voiceless fricatives. In this study we use electropalatography (EPG) to investigate articulatory and coarticulatory characteristics of tongue-to-palate contact timing during /s/ and /z/ in English. Five typically speaking participants, speakers of Southern British English, produced 500 trochaic words containing the intervocalic alveolar fricatives /s/ or /z/. The time between the start of the frication and the maximum contact at the place of articulation was expressed as a percentage of each fricative's total duration (time to target, TT).This measure was used to analyse articulatory and coarticulatory timing during /s/ and /z/. Data for absolute timing were also presented. The results showed that the time between the start of the frication and the maximum contact point was longer for /s/ than for /z/. This difference was consistent across speakers but was not significant for all of them. The results of the coarticulatory effects showed that the influence of vowel context on TT values for /s/ and /z/ did not differ significantly, but there was a tendency for /z/ to be more resistant to coarticulation effects than /s/.
Collapse
|
198
|
Parrell B, Narayanan S. Explaining Coronal Reduction: Prosodic Structure and Articulatory Posture. PHONETICA 2018; 75:151-181. [PMID: 29433121 PMCID: PMC5892835 DOI: 10.1159/000481099] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2016] [Accepted: 08/28/2017] [Indexed: 06/08/2023]
Abstract
Consonant reduction is often treated as an allophonic process at the phonological planning level, with one production target (allophone) being substituted for another. We propose that, alternatively, reduction can be the result of an online process driven by prosodically conditioned durational variability and an invariant production target. We show that this approach can account for patterns of coronal stop (/t/, /d/, and /n/) production in both American English and Spanish. Contrary to effort-driven theories of reduction, we show that reduction does notdepend on changes to gestural stiffness. Moreover, we demonstrate how differences between and within a language in the particular articulatory postures used to produce different coronal stops automatically lead to reduction to what have normally been considered distinct allophones - coronal approximants ([ð̞]) and flaps ([ɾ]). In this way, our approach allows us to understand different outcomes of coronal stop reduction as the dynamic interaction of a single process (durationally driven undershoot) and variable spatial targets. We show that these patterns are reflected across a wide variety of languages, and show how alternative outcomes of reduction may fit within the same general framework.
Collapse
|
199
|
Pulkki V, Lähivaara T, Huhtakallio I. Effects of flow gradients on directional radiation of human voice. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:1173. [PMID: 29495729 DOI: 10.1121/1.5025063] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In voice communication in windy outdoor conditions, complex velocity gradients appear in the flow field around the source, the receiver, and also in the atmosphere. It is commonly known that voice emanates stronger towards the downstream direction when compared with the upstream direction. In literature, the atmospheric effects are used to explain the stronger emanation in the downstream direction. This work shows that the wind also has an effect to the directivity of voice also favouring the downstream direction. The effect is addressed by measurements and simulations. Laboratory measurements are conducted by using a large pendulum with a loudspeaker mimicking the human head, whereas practical measurements utilizing the human voice are realized by placing a subject through the roof window of a moving car. The measurements and a simulation indicate congruent results in the speech frequency range: When the source faces the downstream direction, stronger radiation coinciding with the wind direction is observed, and when it faces the upstream direction, radiation is not affected notably. The simulated flow gradients show a wake region in the downstream direction, and the simulated acoustic field in the flow show that the region causes a wave-guide effect focusing the sound in the direction.
Collapse
|
200
|
Biswal M, Mishra SK. Comparison of time-frequency methods for analyzing stimulus frequency otoacoustic emissions. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:626. [PMID: 29495731 PMCID: PMC5796829 DOI: 10.1121/1.5022783] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Stimulus frequency otoacoustic emissions (SFOAEs) can have multiple time varying components, including multiple internal reflections. It is, therefore, necessary to study SFOAEs using techniques that can represent their time-frequency behavior. Although various time-frequency schemes can be applied to identify and filter SFOAE components, their accuracy for SFOAE analysis has not been investigated. The relative performance of these methods is important for accurate characterization of SFOAEs that may, in turn, enhance the understanding of SFOAE generation. This study using in silico experiments examined the performance of three linear (short-time Fourier transform, continuous wavelet transform, Stockwell transform) and two nonlinear (empirical mode decomposition and synchrosqueezed wavelet transform) time-frequency approaches for SFOAE analysis. Their performances in terms of phase-gradient delay estimation, frequency specificity, and spectral component extraction are compared, and the relative merits and limitations of each method are discussed. Overall, this paper provides a comparative analysis of various time-frequency methods useful for otoacoustic emission applications.
Collapse
|