1
|
Tan X, Chen J, Liu H, Cong J, Zhang C, Liu Y, Wang X, Leng Y, Yi Y, He L, Zhao S, Qin T, Soong F, Liu TY. NaturalSpeech: End-to-End Text-to-Speech Synthesis With Human-Level Quality. IEEE Trans Pattern Anal Mach Intell 2024; 46:4234-4245. [PMID: 38241115 DOI: 10.1109/tpami.2024.3356232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/21/2024]
Abstract
Text-to-speech (TTS) has made rapid progress in both academia and industry in recent years. Some questions naturally arise that whether a TTS system can achieve human-level quality, how to define/judge that quality, and how to achieve it. In this paper, we answer these questions by first defining the human-level quality based on the statistical significance of subjective measure and introducing appropriate guidelines to judge it, and then developing a TTS system called NaturalSpeech that achieves human-level quality on benchmark datasets. Specifically, we leverage a variational auto-encoder (VAE) for end-to-end text-to-waveform generation, with several key modules to enhance the capacity of the prior from text and reduce the complexity of the posterior from speech, including phoneme pre-training, differentiable duration modeling, bidirectional prior/posterior modeling, and a memory mechanism in VAE. Experimental evaluations on the popular LJSpeech dataset show that our proposed NaturalSpeech achieves -0.01 CMOS (comparative mean opinion score) to human recordings at the sentence level, with Wilcoxon signed rank test at p-level p >> 0.05, which demonstrates no statistically significant difference from human recordings for the first time.
Collapse
|
2
|
Budnevsky AV, Ovsyannikov ES, Avdeev SN, Choporov ON, Feigelman SN, Maksimov AV. [The role of spectral analysis of cough sounds in the diagnosis of COVID-19]. TERAPEVT ARKH 2024; 96:228-232. [PMID: 38713036 DOI: 10.26442/00403660.2024.03.202636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 03/30/2024] [Indexed: 05/08/2024]
Abstract
AIM To evaluate the possibility of using spectral analysis of cough sounds in the diagnosis of a new coronavirus infection COVID-19. MATERIALS AND METHODS Spectral toussophonobarography was performed in 218 patients with COVID-19 [48.56% men, 51.44% women, average age 40.2 (32.4; 51.0)], in 60 healthy individuals [50% men, 50% women, average age 41.7 (32.2; 53.0)] with induced cough (by inhalation of citric acid solution at a concentration of 20 g/l through a nebulizer). The recording was made using a contact microphone located on a special tripod at a distance of 15-20 cm from the face of the subject. The resulting recordings were processed in a computer program, after which spectral analysis of cough sounds was performed using Fourier transform algorithms. The following parameters of cough sounds were evaluated: the duration of the cough act (ms), the ratio of the energy of low frequencies (60-600 Hz) to the energy of high frequencies (600-6000 Hz), the frequency of the maximum energy of the cough sound (Hz). RESULTS After statistical processing, it was found out that the parameters of the cough sound of COVID-19 patients differ from the cough of healthy individuals. The obtained data were substituted into the developed regression equation. Rounded to integers, the resulting number had the following interpretation: "0" - there is no COVID-19, "1" - there is COVID-19. CONCLUSION The technique showed high levels of sensitivity and specificity. In addition, the method is characterized by sufficient ease of use and does not require expensive equipment, therefore it can be used in practice for timely diagnosis of COVID-19.
Collapse
Affiliation(s)
| | | | - S N Avdeev
- Sechenov First Moscow State Medical University (Sechenov University)
| | | | | | | |
Collapse
|
3
|
Yoshinaga T, Nozaki K, Kondo O, Iida A. Estimation of sibilant groove formation and sound generation from early hominin jawbones. JASA Express Lett 2022; 2:045203. [PMID: 36154226 DOI: 10.1121/10.0010209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The speech production capability of sibilant fricatives of early hominin was assessed by interpolating the modern human vocal tract to an Australopithecine specimen based on the jawbone landmarks, and then simulating the airflow and sound generation. The landmark interpolation demonstrates the possibility to form the sibilant groove in the anterior part of the oral tract, and results of the aeroacoustic simulation indicate that the early hominins had the potential to produce the fricative broadband noise with a constant supply of airflow to the oral cavity, although the ancestor's tongue deformation ability is still uncertain, and the results are highly speculative.
Collapse
Affiliation(s)
- Tsukasa Yoshinaga
- Department of Mechanical Engineering, Toyohashi University of Technology, 1-1 Hibarigaoka, Tempaku, Toyohashi 441-8580, Japan
| | - Kazunori Nozaki
- Division of Medical Information, Osaka University Dental Hospital, 1-8 Yamadaoka, Suita 565-0871, Japan
| | - Osamu Kondo
- Department of Biological Sciences. Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku. Tokyo, 113-0033, Japan , , ,
| | - Akiyoshi Iida
- Department of Mechanical Engineering, Toyohashi University of Technology, 1-1 Hibarigaoka, Tempaku, Toyohashi 441-8580, Japan
| |
Collapse
|
4
|
Wilcock WSD, Hilmo RS. A method for tracking blue whales (Balaenoptera musculus) with a widely spaced network of ocean bottom seismometers. PLoS One 2021; 16:e0260273. [PMID: 34910750 PMCID: PMC8673649 DOI: 10.1371/journal.pone.0260273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Accepted: 11/07/2021] [Indexed: 11/24/2022] Open
Abstract
Passive acoustic monitoring is an important tool for studying marine mammals. Ocean bottom seismometer networks provide data sets of opportunity for studying blue whales (Balaenoptera musculus) which vocalize extensively at seismic frequencies. We describe methods to localize calls and obtain tracks using the B call of northeast Pacific blue whale recorded by a large network of widely spaced ocean bottom seismometers off the coast of the Pacific Northwest. The first harmonic of the B call at ~15 Hz is detected using spectrogram cross-correlation. The seasonality of calls, inferred from a dataset of calls identified by an analyst, is used to estimate the probability that detections are true positives as a function of the strength of the detection. Because the spacing of seismometers reaches 70 km, faint detections with a significant probability of being false positives must be considered in multi-station localizations. Calls are located by maximizing a likelihood function which considers each strong detection in turn as the earliest arrival time and seeks to fit the times of detections that follow within a feasible time and distance window. An alternative procedure seeks solutions based on the detections that maximize their sum after weighting by detection strength and proximity. Both approaches lead to many spurious solutions that can mix detections from different B calls and include false detections including misidentified A calls. Tracks that are reliable can be obtained iteratively by assigning detections to localizations that are grouped in space and time, and requiring groups of at least 20 locations. Smooth paths are fit to tracks by including constraints that minimize changes in speed and direction while fitting the locations to their uncertainties or applying the double difference relocation method. The reliability of localizations for future experiments might be improved by increasing sampling rates and detecting harmonics of the B call.
Collapse
Affiliation(s)
- William S. D. Wilcock
- School of Oceanography, University of Washington, Seattle, WA, United States of America
- * E-mail:
| | - Rose S. Hilmo
- School of Oceanography, University of Washington, Seattle, WA, United States of America
| |
Collapse
|
5
|
Mohammed EA, Keyhani M, Sanati-Nezhad A, Hejazi SH, Far BH. An ensemble learning approach to digital corona virus preliminary screening from cough sounds. Sci Rep 2021; 11:15404. [PMID: 34321592 PMCID: PMC8319422 DOI: 10.1038/s41598-021-95042-2] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Accepted: 07/09/2021] [Indexed: 02/05/2023] Open
Abstract
This work develops a robust classifier for a COVID-19 pre-screening model from crowdsourced cough sound data. The crowdsourced cough recordings contain a variable number of coughs, with some input sound files more informative than the others. Accurate detection of COVID-19 from the sound datasets requires overcoming two main challenges (i) the variable number of coughs in each recording and (ii) the low number of COVID-positive cases compared to healthy coughs in the data. We use two open datasets of crowdsourced cough recordings and segment each cough recording into non-overlapping coughs. The segmentation enriches the original data without oversampling by splitting the original cough sound files into non-overlapping segments. Splitting the sound files enables us to increase the samples of the minority class (COVID-19) without changing the feature distribution of the COVID-19 samples resulted from applying oversampling techniques. Each cough sound segment is transformed into six image representations for further analyses. We conduct extensive experiments with shallow machine learning, Convolutional Neural Network (CNN), and pre-trained CNN models. The results of our models were compared to other recently published papers that apply machine learning to cough sound data for COVID-19 detection. Our method demonstrated a high performance using an ensemble model on the testing dataset with area under receiver operating characteristics curve = 0.77, precision = 0.80, recall = 0.71, F1 measure = 0.75, and Kappa = 0.53. The results show an improvement in the prediction accuracy of our COVID-19 pre-screening model compared to the other models.
Collapse
Affiliation(s)
- Emad A Mohammed
- Department of Electrical and Software Engineering, University of Calgary, Calgary, T2N 1N4, Canada
| | - Mohammad Keyhani
- Haskayne School of Business, University of Calgary, Calgary, T2N 1N4, Canada
| | - Amir Sanati-Nezhad
- Department of Mechanical and Manufacturing Engineering, University of Calgary, Calgary, T2N 1N4, Canada
| | - S Hossein Hejazi
- Department of Chemical and Petroleum Engineering, University of Calgary, Calgary, T2N 1N4, Canada.
| | - Behrouz H Far
- Department of Electrical and Software Engineering, University of Calgary, Calgary, T2N 1N4, Canada.
| |
Collapse
|
6
|
Wermke K, Robb MP, Schluter PJ. Melody complexity of infants' cry and non-cry vocalisations increases across the first six months. Sci Rep 2021; 11:4137. [PMID: 33602997 PMCID: PMC7893022 DOI: 10.1038/s41598-021-83564-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Accepted: 01/20/2021] [Indexed: 11/09/2022] Open
Abstract
In early infancy, melody provides the most salient prosodic element for language acquisition and there is huge evidence for infants' precocious aptitudes for musical and speech melody perception. Yet, a lack of knowledge remains with respect to melody patterns of infants' vocalisations. In a search for developmental regularities of cry and non-cry vocalisations and for building blocks of prosody (intonation) over the first 6 months of life, more than 67,500 melodies (fundamental frequency contours) of 277 healthy infants from monolingual German families were quantitatively analysed. Based on objective criteria, vocalisations with well-identifiable melodies were grouped into those exhibiting a simple (single-arc) or complex (multiple-arc) melody pattern. Longitudinal analysis using fractional polynomial multi-level mixed effects logistic regression models were applied to these patterns. A significant age (but not sex) dependent developmental pattern towards more complexity was demonstrated in both vocalisation types over the observation period. The theoretical concept of melody development (MD-Model) contends that melody complexification is an important building block on the path towards language. Recognition of this developmental process will considerably improve not only our understanding of early preparatory processes for language acquisition, but most importantly also allow for the creation of clinically robust risk markers for developmental language disorders.
Collapse
Affiliation(s)
- Kathleen Wermke
- Center for Pre-Speech Development & Developmental Disorders, University Hospital, University of Würzburg, Pleicherwall 2, 97070, Würzburg, Germany.
| | - Michael P Robb
- Department of Communication Sciences and Disorders, Pennsylvania State University, State College, USA
- School of Health Sciences, University of Canterbury - Te Whare Wānanga O Waitaha, Christchurch, New Zealand
| | - Philip J Schluter
- School of Health Sciences, University of Canterbury - Te Whare Wānanga O Waitaha, Christchurch, New Zealand
- School of Clinical Medicine, Primary Care Clinical Unit, The University of Queensland, Brisbane, Australia
| |
Collapse
|
7
|
Ramallo-González AP, González-Vidal A, Skarmeta AF. CIoTVID: Towards an Open IoT-Platform for Infective Pandemic Diseases such as COVID-19. Sensors (Basel) 2021; 21:E484. [PMID: 33445499 PMCID: PMC7827168 DOI: 10.3390/s21020484] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 01/03/2021] [Accepted: 01/07/2021] [Indexed: 12/22/2022]
Abstract
The factors affecting the penetration of certain diseases such as COVID-19 in society are still unknown. Internet of Things (IoT) technologies can play a crucial role during the time of crisis and they can provide a more holistic view of the reasons that govern the outbreak of a contagious disease. The understanding of COVID-19 will be enriched by the analysis of data related to the phenomena, and this data can be collected using IoT sensors. In this paper, we show an integrated solution based on IoT technologies that can serve as opportunistic health data acquisition agents for combating the pandemic of COVID-19, named CIoTVID. The platform is composed of four layers-data acquisition, data aggregation, machine intelligence and services, within the solution. To demonstrate its validity, the solution has been tested with a use case based on creating a classifier of medical conditions using real data of voice, performing successfully. The layer of data aggregation is particularly relevant in this kind of solution as the data coming from medical devices has a very different nature to that coming from electronic sensors. Due to the adaptability of the platform to heterogeneous data and volumes of data; individuals, policymakers, and clinics could benefit from it to fight the propagation of the pandemic.
Collapse
|
8
|
Leon-Lopez B, Romero-Vivas E, Viloria-Gomora L. Reduction of roadway noise in a coastal city underwater soundscape during COVID-19 confinement. J Acoust Soc Am 2021; 149:652. [PMID: 33514174 PMCID: PMC7857497 DOI: 10.1121/10.0003354] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Revised: 12/06/2020] [Accepted: 12/28/2020] [Indexed: 06/01/2023]
Abstract
Confinement due to the COVID-19 pandemic drastically reduced human activities. Underwater soundscape variations are discussed in this study, comparing a typical and confinement day in a coastal lagoon near a popular tourist city in Mexico. Recording devices were located at 2 m in depth and 430 m away from the main promenade-a two-way avenue for light vehicle traffic-where main tourist infrastructure is located. The nearby marine environment is habitat to birds and dolphins as well as fish and invertebrates of commercial importance. Medium and small boats usually transit the area. The main underwater sound level reduction was measured at low frequencies (10-2000 Hz) because of the decrease in roadway noise. Vessel traffic also decreased by almost three quarters, although the level reduction due to this source was less noticeable. As typical day levels in the roadway noise band can potentially mask fish sounds and affect other low frequency noise-sensitive marine taxa, this study suggests that comprehensive noise analysis in coastal marine environments should consider the contribution from nearby land sources.
Collapse
Affiliation(s)
- Braulio Leon-Lopez
- Acoustics and Signal Processing Research Group, Centro de Investigaciones Biológicas del Noroeste (CIBNOR), Avenida IPN 195, Playa Palo de Santa Rita Sur, C.P. 23096 La Paz, Baja California Sur, Mexico, USA
| | - Eduardo Romero-Vivas
- Acoustics and Signal Processing Research Group, Centro de Investigaciones Biológicas del Noroeste (CIBNOR), Avenida IPN 195, Playa Palo de Santa Rita Sur, C.P. 23096 La Paz, Baja California Sur, Mexico, USA
| | - Lorena Viloria-Gomora
- Programa de Investigación en Mamíferos Marinos, Universidad Autónoma de Baja California Sur (UABCS), Carretera al Sur km 5.5, Mezquitito, C.P. 23080 La Paz, Baja California Sur, Mexico, USA
| |
Collapse
|
9
|
Abstract
Background: Transient noise can be disruptive for people wearing hearing aids. Ideally, the transient noise should be detected and controlled by the signal processor without disrupting speech and other intended input signals. A technology for detecting and controlling transient noises in hearing aids was evaluated in this study.
Purpose: The purpose of this study was to evaluate the effectiveness of a transient noise reduction strategy on various transient noises and to determine whether the strategy has a negative impact on sound quality of intended speech inputs.
Research Design: This was a quasi-experimental study. The study involved 24 hearing aid users. Each participant was asked to rate the parameters of speech clarity, transient noise loudness, and overall impression for speech stimuli under the algorithm-on and algorithm-off conditions. During the evaluation, three types of stimuli were used: transient noises, speech, and background noises. The transient noises included “knife on a ceramic board,” “mug on a tabletop,” “office door slamming,” “car door slamming,” and “pen tapping on countertop.” The speech sentences used for the test were presented by a male speaker in Mandarin. The background noises included “party noise” and “traffic noise.” All of these sounds were combined into five listening situations: (1) speech only, (2) transient noise only, (3) speech and transient noise, (4) background noise and transient noise, and (5) speech and background noise and transient noise.
Results: There was no significant difference on the ratings of speech clarity between the algorithm-on and algorithm-off (t-test, p = 0.103). Further analysis revealed that speech clarity was significant better at 70 dB SLP than 55 dB SPL (p < 0.001). For transient noise loudness: under the algorithm-off condition, the percentages of subjects rating the transient noise to be somewhat soft, appropriate, somewhat loud, and too loud were 0.2, 47.1, 29.6, and 23.1%, respectively. The corresponding percentages under the algorithm-on were 3.0, 72.6, 22.9, and 1.4%, respectively. A significant difference on the ratings of the transient noise loudness was found between the algorithm-on and algorithm-off (t-test, p < 0.001). For overall impression for speech stimuli: under the algorithm-off condition, the percentage of subjects rating the algorithm to be not helpful at all, somewhat helpful, helpful, and very helpful for speech stimuli were 36.5, 20.8, 33.9, and 8.9%, respectively. Under the algorithm-on condition, the corresponding percentages were 35.0, 19.3, 30.7, and 15.0%, respectively. Statistical analysis revealed there was a significant difference on the ratings of overall impression on speech stimuli. The ratings under the algorithm-on condition were significantly more helpful for speech understanding than the ratings under algorithm-off (t-test, p < 0.001).
Conclusions: The transient noise reduction strategy appropriately controlled the loudness for most of the transient noises and did not affect the sound quality, which could be beneficial to hearing aid wearers.
Collapse
Affiliation(s)
- HaiHong Liu
- Beijing Tong Ren Hospital, Capital Medical University, Beijing, China
| | | | | | | | | |
Collapse
|
10
|
Barbier G, Perrier P, Payan Y, Tiede MK, Gerber S, Perkell JS, Ménard L. What anticipatory coarticulation in children tells us about speech motor control maturity. PLoS One 2020; 15:e0231484. [PMID: 32287289 PMCID: PMC7156059 DOI: 10.1371/journal.pone.0231484] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Accepted: 03/24/2020] [Indexed: 11/18/2022] Open
Abstract
PURPOSE This study aimed to evaluate the role of motor control immaturity in the speech production characteristics of 4-year-old children, compared to adults. Specifically, two indices were examined: trial-to-trial variability, which is assumed to be linked to motor control accuracy, and anticipatory extra-syllabic vowel-to-vowel coarticulation, which is assumed to be linked to the comprehensiveness, maturity and efficiency of sensorimotor representations in the central nervous system. METHOD Acoustic and articulatory (ultrasound) data were recorded for 20 children and 10 adults, all native speakers of Canadian French, during the production of isolated vowels and vowel-consonant-vowel (V1-C-V2) sequences. Trial-to-trial variability was measured in isolated vowels. Extra-syllabic anticipatory coarticulation was assessed in V1-C-V2 sequences by measuring the patterns of variability of V1 associated with variations in V2. Acoustic data were reported for all subjects and articulatory data, for a subset of 6 children and 2 adults. RESULTS Trial-to-trial variability was significantly larger in children. Systematic and significant anticipation of V2 in V1 was always found in adults, but was rare in children. Significant anticipation was observed in children only when V1 was /a/, and only along the antero-posterior dimension, with a much smaller magnitude than in adults. A closer analysis of individual speakers revealed that some children showed adult-like anticipation along this dimension, whereas the majority did not. CONCLUSION The larger trial-to-trial variability and the lack of anticipatory behavior in most children-two phenomena that have been observed in several non-speech motor tasks-support the hypothesis that motor control immaturity may explain a large part of the differences observed between speech production in adults and 4-year-old children, apart from other causes that may be linked with language development.
Collapse
Affiliation(s)
- Guillaume Barbier
- Grenoble INP, CNRS, GIPSA-Lab UMR 5216, Univ. Grenoble Alpes, Grenoble, France
| | - Pascal Perrier
- Grenoble INP, CNRS, GIPSA-Lab UMR 5216, Univ. Grenoble Alpes, Grenoble, France
- * E-mail:
| | - Yohan Payan
- Grenoble INP, CNRS, TIMC-IMAG UMR 5525, Univ. Grenoble Alpes, Grenoble, France
| | - Mark K. Tiede
- Haskins Laboratories, New Haven, Connecticut, United States of America
| | - Silvain Gerber
- Grenoble INP, CNRS, GIPSA-Lab UMR 5216, Univ. Grenoble Alpes, Grenoble, France
| | - Joseph S. Perkell
- Boston University, Boston, Massachusetts, United States of America
- Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Lucie Ménard
- Department of Linguistics, Université du Québec à Montréal, Montréal, Québec, Canada
| |
Collapse
|
11
|
Zhang T, Shao Y, Wu Y, Pang Z, Liu G. Multiple Vowels Repair Based on Pitch Extraction and Line Spectrum Pair Feature for Voice Disorder. IEEE J Biomed Health Inform 2020; 24:1940-1951. [PMID: 32149701 DOI: 10.1109/jbhi.2020.2978103] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Individuals, such as voice-related professionals, elderly people and smokers, are increasingly suffering from voice disorder, which implies the importance of pathological voice repair. Previous work on pathological voice repair only concerned about sustained vowel /a/, but multiple vowels repair is still challenging due to the unstable extraction of pitch and the unsatisfactory reconstruction of formant. In this paper, a multiple vowels repair based on pitch extraction and Line Spectrum Pair feature for voice disorder is proposed, which broadened the research subjects of voice repair from only single vowel /a/ to multiple vowels /a/, /i/ and /u/ and achieved the repair of these vowels successfully. Considering deep neural network as a classifier, a voice recognition is performed to classify the normal and pathological voices. Wavelet Transform and Hilbert-Huang Transform are applied for pitch extraction. Based on Line Spectrum Pair (LSP) feature, the formant is reconstructed. The final repaired voice is obtained by synthesizing the pitch and the formant. The proposed method is validated on Saarbrücken Voice Database (SVD) database. The achieved improvements of three metrics, Segmental Signal-to-Noise Ratio, LSP distance measure and Mel cepstral distance measure, are respectively 45.87%, 50.37% and 15.56%. Besides, an intuitive analysis based on spectrogram has been done and a prominent repair effect has been achieved.
Collapse
|
12
|
Yurlova DD, Volodin IA, Ilchenko OG, Volodina EV. Rapid development of mature vocal patterns of ultrasonic calls in a fast-growing rodent, the yellow steppe lemming (Eolagurus luteus). PLoS One 2020; 15:e0228892. [PMID: 32045453 PMCID: PMC7015103 DOI: 10.1371/journal.pone.0228892] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2019] [Accepted: 01/24/2020] [Indexed: 01/16/2023] Open
Abstract
Ultrasonic vocalizations (USV) of laboratory rodents may serve as age-dependent indicators of emotional arousal and anxiety. Fast-growing Arvicolinae rodent species might be advantageous wild-type animal models for behavioural and medical research related to USV ontogeny. For the yellow steppe lemming Eolagurus luteus, only audible calls of adults were previously described. This study provides categorization and spectrographic analyses of 1176 USV calls emitted by 120 individual yellow steppe lemmings at 12 age classes, from birth to breeding adults over 90 days (d) of age, 10 individuals per age class, up to 10 USV calls per individual. The USV calls emerged since 1st day of pup life and occurred at all 12 age classes and in both sexes. The unified 2-min isolation procedure on an unfamiliar territory was equally applicable for inducing USV calls at all age classes. Rapid physical growth (1 g body weight gain per day from birth to 40 d of age) and the early (9-12 d) eyes opening correlated with the early (9-12 d) emergence of mature vocal patterns of USV calls. The mature vocal patterns included a prominent shift in percentages of chevron and upward contours of fundamental frequency (f0) and the changes in the acoustic variables of USV calls. Call duration was the longest at 1-4 d, significantly shorter at 9-12 d and did not between 9-12-d and older age classes. The maximum fundamental frequency (f0max) decreased with increase of age class, from about 50 kHz in neonates to about 40 kHz in adults. These ontogenetic pathways of USV duration and f0max (towards shorter and lower-frequency USV calls) were reminiscent of those in laboratory mice Mus musculus.
Collapse
Affiliation(s)
- Daria D. Yurlova
- Department of Vertebrate Zoology, Faculty of Biology, Lomonosov Moscow State
University, Moscow, Russia
| | - Ilya A. Volodin
- Department of Vertebrate Zoology, Faculty of Biology, Lomonosov Moscow State
University, Moscow, Russia
- Scientific Research Department, Moscow Zoo, Moscow, Russia
| | | | | |
Collapse
|
13
|
Tachibana RO, Kanno K, Okabe S, Kobayasi KI, Okanoya K. USVSEG: A robust method for segmentation of ultrasonic vocalizations in rodents. PLoS One 2020; 15:e0228907. [PMID: 32040540 PMCID: PMC7010259 DOI: 10.1371/journal.pone.0228907] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Accepted: 01/27/2020] [Indexed: 11/18/2022] Open
Abstract
Rodents' ultrasonic vocalizations (USVs) provide useful information for assessing their social behaviors. Despite previous efforts in classifying subcategories of time-frequency patterns of USV syllables to study their functional relevance, methods for detecting vocal elements from continuously recorded data have remained sub-optimal. Here, we propose a novel procedure for detecting USV segments in continuous sound data containing background noise recorded during the observation of social behavior. The proposed procedure utilizes a stable version of the sound spectrogram and additional signal processing for better separation of vocal signals by reducing the variation of the background noise. Our procedure also provides precise time tracking of spectral peaks within each syllable. We demonstrated that this procedure can be applied to a variety of USVs obtained from several rodent species. Performance tests showed this method had greater accuracy in detecting USV syllables than conventional detection methods.
Collapse
Affiliation(s)
- Ryosuke O. Tachibana
- Department of Life Sciences, Graduate School of Arts & Sciences, The University of Tokyo, Tokyo, Japan
- * E-mail:
| | - Kouta Kanno
- Laboratory of Neuroscience, Course of Psychology, Department of Humanities, Faculty of Law, Economics and the Humanities, Kagoshima University, Kagoshima, Japan
| | - Shota Okabe
- Division of Brain and Neurophysiology, Department of Physiology, Jichi Medical University, Tochigi, Japan
| | - Kohta I. Kobayasi
- Graduate School of Life and Medical Sciences, Doshisha University, Kyoto, Japan
| | - Kazuo Okanoya
- Department of Life Sciences, Graduate School of Arts & Sciences, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
14
|
Du X, Carpentier L, Teng G, Liu M, Wang C, Norton T. Assessment of Laying Hens' Thermal Comfort Using Sound Technology. Sensors (Basel) 2020; 20:E473. [PMID: 31947639 PMCID: PMC7013866 DOI: 10.3390/s20020473] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/14/2019] [Revised: 01/05/2020] [Accepted: 01/11/2020] [Indexed: 11/17/2022]
Abstract
Heat stress is one of the most important environmental stressors facing poultry production and welfare worldwide. The detrimental effects of heat stress on poultry range from reduced growth and egg production to impaired health. Animal vocalisations are associated with different animal responses and can be used as useful indicators of the state of animal welfare. It is already known that specific chicken vocalisations such as alarm, squawk, and gakel calls are correlated with stressful events, and therefore, could be used as stress indicators in poultry monitoring systems. In this study, we focused on developing a hen vocalisation detection method based on machine learning to assess their thermal comfort condition. For extraction of the vocalisations, nine source-filter theory related temporal and spectral features were chosen, and a support vector machine (SVM) based classifier was developed. As a result, the classification performance of the optimal SVM model was 95.1 ± 4.3% (the sensitivity parameter) and 97.6 ± 1.9% (the precision parameter). Based on the developed algorithm, the study illustrated that a significant correlation existed between specific vocalisations (alarm and squawk call) and thermal comfort indices (temperature-humidity index, THI) (alarm-THI, R = -0.414, P = 0.01; squawk-THI, R = 0.594, P = 0.01). This work represents the first step towards the further development of technology to monitor flock vocalisations with the intent of providing producers an additional tool to help them actively manage the welfare of their flock.
Collapse
Affiliation(s)
- Xiaodong Du
- College of Water Resources & Civil Engineering, China Agricultural University, Beijing 100083, China; (X.D.); (M.L.); (C.W.)
| | - Lenn Carpentier
- Division Measure, Model & Mange Bioresponses, Department of Biosystems, KU Leuven, Kasteelpark Arenberg 30, 3001 Heverlee, Belgium;
| | - Guanghui Teng
- College of Water Resources & Civil Engineering, China Agricultural University, Beijing 100083, China; (X.D.); (M.L.); (C.W.)
| | - Mulin Liu
- College of Water Resources & Civil Engineering, China Agricultural University, Beijing 100083, China; (X.D.); (M.L.); (C.W.)
| | - Chaoyuan Wang
- College of Water Resources & Civil Engineering, China Agricultural University, Beijing 100083, China; (X.D.); (M.L.); (C.W.)
| | - Tomas Norton
- Division Measure, Model & Mange Bioresponses, Department of Biosystems, KU Leuven, Kasteelpark Arenberg 30, 3001 Heverlee, Belgium;
| |
Collapse
|
15
|
Sundström E, Oren L. Sound production mechanisms of audible nasal emission during the sibilant /s/. J Acoust Soc Am 2019; 146:4199. [PMID: 31893718 PMCID: PMC7043896 DOI: 10.1121/1.5135566] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 11/05/2019] [Accepted: 11/05/2019] [Indexed: 06/10/2023]
Abstract
Audible nasal emission is a speech disorder that involves undesired sound generated by airflow into the nasal cavity during production of oral sounds. This disorder is associated with small-to-medium sized velopharyngeal openings. These openings induce turbulence in the nasal cavity, which in turn produces sound. The purpose of this study is to examine the aeroacoustic mechanisms that generate turbulent sound during production of a sibilant /s/ with and without a small opening of the velopharyngeal valve. The models are based on two pediatric subjects who were diagnosed with severe audible nasal emission. The geometries were delineated from computed tomography scans taken while the subjects were sustaining a sibilant sound. Large eddy simulation with the Ffowcs Williams and Hawkings analogy was used to predict the flow behavior and its acoustic characterization. It shows that the majority of the acoustic energy is produced by surface loading, which is related to dipole sources that resonate in the nasal cavity. The quadrupole source term that is associated with the unsteady shear layers is seen to be less significant. It also shows that closure of the velopharyngeal valve changes the far-field spectrum significantly because aeroacoustic mechanisms in the nasal cavity are eliminated.
Collapse
Affiliation(s)
- Elias Sundström
- Department of Otolaryngology-Head and Neck Surgery, University of Cincinnati, 231 Albert Sabin Way, Cincinnati, Ohio 45267, USA
| | - Liran Oren
- Department of Otolaryngology-Head and Neck Surgery, University of Cincinnati, 231 Albert Sabin Way, Cincinnati, Ohio 45267, USA
| |
Collapse
|
16
|
Novak A, Cisar P, Bruneau M, Lotton P, Simon L. Localization of sound-producing fish in a water-filled tank. J Acoust Soc Am 2019; 146:4842. [PMID: 31893704 DOI: 10.1121/1.5138607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Accepted: 07/04/2019] [Indexed: 06/10/2023]
Abstract
In this paper, the authors introduce an algorithm for locating sound-producing fish in a small rectangular tank that can be used, e.g., in behavioral bioacoustical studies to determine which fish in a group is sound-producing. The technique consists of locating a single sound source in the tank using signals gathered by four hydrophones placed in the tank together with a group of fish under study. The localization algorithm used in this paper is based on a ratio of two spectra ratios: the spectra ratio between the sound pressure measured by hydrophones at two locations and the spectra ratio between the theoretical Green's functions at the same locations. The results are compared to a localization based on image processing technique and with video recordings acquired synchronously with the acoustic recordings.
Collapse
Affiliation(s)
- Antonin Novak
- Laboratoire d'Acoustique de l'Université du Mans (LAUM, UMR CNRS 6613), 72000 Le Mans, France
| | - Petr Cisar
- Laboratory of Signal and Image Processing, Institute of Complex Systems, South Bohemian Research Center of Aquaculture and Biodiversity of Hydrocenoses, Faculty of Fisheries and Protection of Waters, University of South Bohemia in České Budĕjovice, Zámek 136, Nové Hrady 37333, Czech Republic
| | - Michel Bruneau
- Laboratoire d'Acoustique de l'Université du Mans (LAUM, UMR CNRS 6613), 72000 Le Mans, France
| | - Pierrick Lotton
- Laboratoire d'Acoustique de l'Université du Mans (LAUM, UMR CNRS 6613), 72000 Le Mans, France
| | - Laurent Simon
- Laboratoire d'Acoustique de l'Université du Mans (LAUM, UMR CNRS 6613), 72000 Le Mans, France
| |
Collapse
|
17
|
Ferguson EL, Ferguson BG. High-precision acoustic localization of dolphin sonar click transmissions using a modified method of passive ranging by wavefront curvature. J Acoust Soc Am 2019; 146:4790. [PMID: 31893743 DOI: 10.1121/1.5138935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2019] [Accepted: 09/26/2019] [Indexed: 06/10/2023]
Abstract
When configured as a wide aperture array, only three hydrophones are required to localize dolphin sonar transmissions with unprecedented precision, even when the underwater sound scene of their natural habitat is complicated by many of them emitting echolocation "click" signals at the same time. Given the sensor position coordinates and speed of sound travel, the passive ranging by the wavefront curvature algorithm estimates the source range and bearing, using range difference measurements between signals, arriving at two adjacent pairs of widely spaced sensors. If the sensor positions are not strictly collinear, then the source range estimates are biased. This problem is overcome by modifying the input parameters to the basic passive ranging algorithm. The experimental results for the estimated source positions are found to agree with the predicted localization performance for a wide aperture array passive ranging sonar. The precision of the source bearing estimates is 0.005°, which is independent of the source range. The precision of the source range estimates degrades a hundredfold (from 2.5 cm to 2.6 m) for a tenfold increase in source range (33-318 m). A lower bound for the peak-to-peak source levels of Indo-Pacific bottlenose dolphins (Tursiops aduncus) is 183 ± 2 dB re 1 μPa for regular click pulses.
Collapse
Affiliation(s)
- Eric L Ferguson
- Australian Centre for Field Robotics, The University of Sydney, New South Wales, 2006, Australia
| | - Brian G Ferguson
- Defence Science and Technology Group, Department of Defence, Australia
| |
Collapse
|
18
|
Schaffeld T, Ruser A, Woelfing B, Baltzer J, Kristensen JH, Larsson J, Schnitzler JG, Siebert U. The use of seal scarers as a protective mitigation measure can induce hearing impairment in harbour porpoises. J Acoust Soc Am 2019; 146:4288. [PMID: 31893707 DOI: 10.1121/1.5135303] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Accepted: 11/05/2019] [Indexed: 06/10/2023]
Abstract
Acoustic deterrent devices (ADDs) are used to deter seals from aquacultures but exposure of harbour porpoises (Phocoena phocoena) occurs as a side-effect. At construction sites, by contrast, ADDs are used to deter harbour porpoises from the zone in which pile driving noise can induce temporary threshold shifts (TTSs). ADDs emit such high pressure levels that there is concern that ADDs themselves may induce a TTS. A harbour porpoise in human care was exposed to an artificial ADD signal with a peak frequency of 14 kHz. A significant TTS was found, measured by auditory evoked potentials, with an onset of 142 dB re 1 μPa2s at 20 kHz and 147 dB re 1 μPa2s at 28 kHz. The authors therefore strongly recommend to gradually increase and down regulate source levels of ADDs to the desired deterrence range. However, further research is needed to develop a reliable relationship between received levels and deterrence.
Collapse
Affiliation(s)
- Tobias Schaffeld
- Institute for Terrestrial and Aquatic Wildlife Research (ITAW), University of Veterinary Medicine Hannover, Foundation, Werftstrasse 6, 25761 Buesum, Germany
| | - Andreas Ruser
- Institute for Terrestrial and Aquatic Wildlife Research (ITAW), University of Veterinary Medicine Hannover, Foundation, Werftstrasse 6, 25761 Buesum, Germany
| | - Benno Woelfing
- Institute for Terrestrial and Aquatic Wildlife Research (ITAW), University of Veterinary Medicine Hannover, Foundation, Werftstrasse 6, 25761 Buesum, Germany
| | - Johannes Baltzer
- Institute for Terrestrial and Aquatic Wildlife Research (ITAW), University of Veterinary Medicine Hannover, Foundation, Werftstrasse 6, 25761 Buesum, Germany
| | | | | | - Joseph G Schnitzler
- Institute for Terrestrial and Aquatic Wildlife Research (ITAW), University of Veterinary Medicine Hannover, Foundation, Werftstrasse 6, 25761 Buesum, Germany
| | - Ursula Siebert
- Institute for Terrestrial and Aquatic Wildlife Research (ITAW), University of Veterinary Medicine Hannover, Foundation, Werftstrasse 6, 25761 Buesum, Germany
| |
Collapse
|
19
|
Ladich F, Maiditsch IP. Temperature affects sound production in fish with two sets of sonic organs: The Pictus cat. Comp Biochem Physiol A Mol Integr Physiol 2019; 240:110589. [PMID: 31648065 DOI: 10.1016/j.cbpa.2019.110589] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Revised: 10/09/2019] [Accepted: 10/10/2019] [Indexed: 11/18/2022]
Abstract
Sound communication is affected by ambient temperature in ectothermic animals including fishes. The present study examines the effects of temperature on acoustic signaling in a fish species possessing two different sound-generating mechanisms. The Amazonian Pictus catfish Pimelodus pictus produces low-frequency harmonic sounds (swimbladder drumming muscles) and high-frequency stridulation sounds (rubbing pectoral fin spines in the pectoral girdle). Sounds of 15 juveniles were recorded when hand-held after three weeks of acclimation at 30 °C, 22 °C and again 30 °C. The following sound characteristics were investigated: calling activity, sound duration, fundamental frequency of drumming sounds and dominant frequency of stridulation sounds. The number of both sound types produced within the first minute of experiments did not change with temperature. In contrast, sound duration was significantly shorter at 30 °C than at 22 °C (drumming: 78-560 ms; stridulation: 23-96 ms). The fundamental frequency of drumming sounds and thus the drumming muscle contraction rate varied from 127 Hz to 242 Hz and increased with temperature. The dominant frequency of broadband stridulation sounds ranged from 1.67 kHz to 3.39 kHz and was unaffected by temperature changes. Our data demonstrate that temperature affects acoustic signaling in P. pictus, although the changes differed between sound characteristics and sound type. The effects vary from no change in calling activity and dominant frequency, to an increase in fundamental frequency and shortened duration of both sound types. Together with the known effects of temperature on hearing in the Pictus cat, the present results indicate that global warming may affect acoustic communication in fishes.
Collapse
Affiliation(s)
- Friedrich Ladich
- Department of Behavioural Biology, University of Vienna, Althanstraße 14, 1090 Wien, Austria.
| | - Isabelle Pia Maiditsch
- Department of Behavioural Biology, University of Vienna, Althanstraße 14, 1090 Wien, Austria.
| |
Collapse
|
20
|
Khairalseed M, Oezdemir I, Hoyt K. Contrast-enhanced ultrasound imaging using pulse inversion spectral deconvolution. J Acoust Soc Am 2019; 146:2466. [PMID: 31671995 PMCID: PMC6794155 DOI: 10.1121/1.5129115] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 09/20/2019] [Accepted: 09/24/2019] [Indexed: 06/01/2023]
Abstract
A contrast-enhanced ultrasound (CEUS) imaging approach, termed pulse inversion spectral deconvolution (PISD), is introduced. The approach uses two Gaussian-weighted Hermite polynomials to form two inverted pulse sequences. The two inversed pulses are then used to filter ultrasound (US) backscattered data and discrimination of the linear and nonlinear signal components. A research US scanner equipped with a linear array transducer was used for data acquisition. The receive data from all channels are shaped using plane wave imaging beamforming with angular compounding (from one to nine angles). In vitro data was collected with a tissue mimicking flow phantom perfused with an US contrast agent using PISD and traditional nonlinear (NLI) US imaging as comparison. The role of imaging frequency (between 4.5 and 6.25 MHz) and mechanical index (from 0.1 to 0.3) were evaluated. Preliminary in vivo data was collected in the hindlimb of three healthy mice. Preliminary experimental findings indicate that the PISD contrast-to-tissue ratio was improved nearly ten times compared to the NLI US imaging approach. Also, the spatial resolution was improved due to the effect of deconvolution and spatial angular compounding. Overall, PISD is a promising postprocessing technique for real-time CEUS imaging.
Collapse
Affiliation(s)
- Mawia Khairalseed
- Department of Bioengineering, University of Texas at Dallas, Richardson, Texas 75080, USA
| | - Ipek Oezdemir
- Department of Bioengineering, University of Texas at Dallas, Richardson, Texas 75080, USA
| | - Kenneth Hoyt
- Department of Bioengineering, University of Texas at Dallas, Richardson, Texas 75080, USA
| |
Collapse
|
21
|
Yang K, Zhou X. Deep learning classification for improved bicoherence feature based on cyclic modulation and cross-correlation. J Acoust Soc Am 2019; 146:2201. [PMID: 31672017 DOI: 10.1121/1.5127166] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2019] [Accepted: 09/08/2019] [Indexed: 06/10/2023]
Abstract
This paper aims to present an improved bicoherence spectrum (IBS) combined with cyclic modulation spectrum (CMS) and cross-correlation that is suitable for classification of hydrophone signals involving deep learning (DL). First, the proposed feature utilizes the all-phase fast Fourier transform to modify the spectrum leakage caused by CMS; this can be used to detect line spectra with low signal-to-noise ratios (SNRs). Second, the cross-correlation and bispectrum are both exploited to suppress non-periodic line spectra interference from CMS. Based on numerous numerical simulations and experimental verification, compared with CMS and conventional bispectrum, the prominent characteristics of IBS include: detecting higher-precision periodic harmonics without single-line interference, superior robustness under low SNR, and greatly reducing the data redundancy. In addition, to test the performance of IBS for DL application, three deep belief network (DBN)-based classifiers-DBN-softmax, DBN-support vector machine, and DBN-random forest-are introduced and employed for five experimental scenarios (including ships and underwater source). The results indicate that benefiting from DBN pre-training, the IBS classification accuracy of DBN-based models is generally higher than 80%.
Collapse
Affiliation(s)
- Kunde Yang
- School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an, Shaanxi 710072, China
| | - Xingyue Zhou
- School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an, Shaanxi 710072, China
| |
Collapse
|
22
|
Demir F, Sengur A, Cummins N, Amiriparian S, Schuller B. Low Level Texture Features for Snore Sound Discrimination. Annu Int Conf IEEE Eng Med Biol Soc 2019; 2018:413-416. [PMID: 30440421 DOI: 10.1109/embc.2018.8512459] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Snoring is often associated with serious health risks such as obstructive sleep apnea and heart disease and may require targeted surgical interventions. In this regard, research into automatically and unobtrusively analysing the site of blockages that cause snore sounds is growing in popularity. Herein, we investigate the use of low level image texture features in classification of four specific types of snore sounds. Specifically, we explore histogram of local binary patterns (LBP) in dense grid of rectangular regions and histogram of oriented gradients (HOG) extracted from colour spectrograms for snore sound characterisation. Support vector machines with homogeneous mapping are used in the classification stage of the proposed method. Various experimental works are carried out with both LBP and HOG descriptors on the INTERSPEECH ComParE 2017 snoring sub-challenge dataset. Results presented indicate that LBP descriptors are better than the HOG descriptors in snore type detection and fusion of the LBP and HOG descriptors produces stronger results than either individual descriptor. Further, when compared to the challenge baseline and state-of-the-art deep spectrum features, our approach achieved relative percentage increases in unweighted average recall of 23.1% and 8.3% respectively.
Collapse
|
23
|
Messner E, Fediuk M, Swatek P, Scheidl S, Smolle-Juttner FM, Olschewski H, Pernkopf F. Crackle and Breathing Phase Detection in Lung Sounds with Deep Bidirectional Gated Recurrent Neural Networks. Annu Int Conf IEEE Eng Med Biol Soc 2019; 2018:356-359. [PMID: 30440410 DOI: 10.1109/embc.2018.8512237] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In this paper, we present a method for event detection in single-channel lung sound recordings. This includes the detection of crackles and breathing phase events (inspiration/expiration). Therefore, we propose an event detection approach with spectral features and bidirectional gated recurrent neural networks (BiGRNNs). In our experiments, we use multichannel lung sound recordings from lung-healthy subjects and patients diagnosed with idiopathic pulmonary fibrosis, collected within a clinical trial. We achieve an event-based F-score of F1 ≈ 86% for breathing phase events and F1 ≈ 72% for crackles. The proposed method shows robustness regarding the contamination of the lung sound recordings with noise, bowel and heart sounds.
Collapse
|
24
|
Zhang X, Shen J, Din ZU, Liu J, Wang G, Hu B. Multimodal Depression Detection: Fusion of Electroencephalography and Paralinguistic Behaviors Using a Novel Strategy for Classifier Ensemble. IEEE J Biomed Health Inform 2019; 23:2265-2275. [PMID: 31478879 DOI: 10.1109/jbhi.2019.2938247] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Currently, depression has become a common mental disorder and one of the main causes of disability worldwide. Due to the difference in depressive symptoms evoked by individual differences, how to design comprehensive and effective depression detection methods has become an urgent demand. This study explored from physiological and behavioral perspectives simultaneously and fused pervasive electroencephalography (EEG) and vocal signals to make the detection of depression more objective, effective and convenient. After extraction of several effective features for these two types of signals, we trained six representational classifiers on each modality, then denoted diversity and correlation of decisions from different classifiers using co-decision tensor and combined these decisions into the ultimate classification result with multi-agent strategy. Experimental results on 170 (81 depressed patients and 89 normal controls) subjects showed that the proposed multi-modal depression detection strategy is superior to the single-modal classifiers or other typical late fusion strategies in accuracy, f1-score and sensitivity. This work indicates that late fusion of pervasive physiological and behavioral signals is promising for depression detection and the multi-agent strategy can take advantage of diversity and correlation of different classifiers effectively to gain a better final decision.
Collapse
|
25
|
Bergler C, Schröter H, Cheng RX, Barth V, Weber M, Nöth E, Hofer H, Maier A. ORCA-SPOT: An Automatic Killer Whale Sound Detection Toolkit Using Deep Learning. Sci Rep 2019; 9:10997. [PMID: 31358873 PMCID: PMC6662697 DOI: 10.1038/s41598-019-47335-w] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 07/12/2019] [Indexed: 11/09/2022] Open
Abstract
Large bioacoustic archives of wild animals are an important source to identify reappearing communication patterns, which can then be related to recurring behavioral patterns to advance the current understanding of intra-specific communication of non-human animals. A main challenge remains that most large-scale bioacoustic archives contain only a small percentage of animal vocalizations and a large amount of environmental noise, which makes it extremely difficult to manually retrieve sufficient vocalizations for further analysis - particularly important for species with advanced social systems and complex vocalizations. In this study deep neural networks were trained on 11,509 killer whale (Orcinus orca) signals and 34,848 noise segments. The resulting toolkit ORCA-SPOT was tested on a large-scale bioacoustic repository - the Orchive - comprising roughly 19,000 hours of killer whale underwater recordings. An automated segmentation of the entire Orchive recordings (about 2.2 years) took approximately 8 days. It achieved a time-based precision or positive-predictive-value (PPV) of 93.2% and an area-under-the-curve (AUC) of 0.9523. This approach enables an automated annotation procedure of large bioacoustics databases to extract killer whale sounds, which are essential for subsequent identification of significant communication patterns. The code will be publicly available in October 2019 to support the application of deep learning to bioaoucstic research. ORCA-SPOT can be adapted to other animal species.
Collapse
Affiliation(s)
- Christian Bergler
- Friedrich-Alexander-University Erlangen-Nuremberg, Department of Computer Science, Pattern Recognition Lab, Martensstr. 3, 91058, Erlangen, Germany.
| | - Hendrik Schröter
- Friedrich-Alexander-University Erlangen-Nuremberg, Department of Computer Science, Pattern Recognition Lab, Martensstr. 3, 91058, Erlangen, Germany
| | - Rachael Xi Cheng
- Department of Ecological Dynamics, Leibniz Institute for Zoo and Wildlife Research (IZW) in the Forschungsverbund Berlin e.V., Alfred-Kowalke-Straße 17, 10315, Berlin, Germany
| | - Volker Barth
- Anthro-Media, Nansenstr. 19, 12047, Berlin, Germany
| | | | - Elmar Nöth
- Friedrich-Alexander-University Erlangen-Nuremberg, Department of Computer Science, Pattern Recognition Lab, Martensstr. 3, 91058, Erlangen, Germany.
| | - Heribert Hofer
- Department of Ecological Dynamics, Leibniz Institute for Zoo and Wildlife Research (IZW) in the Forschungsverbund Berlin e.V., Alfred-Kowalke-Straße 17, 10315, Berlin, Germany
- Department of Biology, Chemistry, Pharmacy, Freie Universität Berlin, Takustrasse 3, 14195, Berlin, Germany
- Department of Veterinary Medicine, Freie Universität Berlin, Oertzenweg 19b, 14195, Berlin, Germany
| | - Andreas Maier
- Friedrich-Alexander-University Erlangen-Nuremberg, Department of Computer Science, Pattern Recognition Lab, Martensstr. 3, 91058, Erlangen, Germany
| |
Collapse
|
26
|
Crance JL, Berchok CL, Wright DL, Brewer AM, Woodrich DF. Song production by the North Pacific right whale, Eubalaena japonica. J Acoust Soc Am 2019; 145:3467. [PMID: 31255101 DOI: 10.1121/1.5111338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2018] [Accepted: 05/22/2019] [Indexed: 06/09/2023]
Abstract
This paper describes song production by the eastern North Pacific right whale (NPRW, Eubalaena japonica) in the southeastern Bering Sea. Songs were localized in real-time to individuals using sonobuoys. Singers whose sex could be determined were all males. Autonomous recorder data from 17 year-long deployments were analyzed to document and characterize song types. Four distinct song types were documented over eight years (2009-2017) at five distinct locations. Each song type consists of a hierarchical structure of 1-3 different repeating phrases comprised predominantly of gunshot sounds; three of the four songs contained additional sound types (downsweep, moan, and low-frequency pulsive call). Songs were detected annually (July-January); all song types remained consistent over eight years. Two different songs often occurred simultaneously, produced by different individuals; the same song was never detected simultaneously at the same location. The same song type was detected on the same day and time at two distant locations, indicating multiple individuals can produce the same song. These findings provide support that males produce song; it remains unknown if females also sing. NPRW is the first right whale species documented to produce song. Based on current knowledge about song in mysticetes, it is hypothesized that these songs are reproductive displays.
Collapse
Affiliation(s)
- Jessica L Crance
- Marine Mammal Laboratory, AFSC/NMFS/NOAA, 7600 Sand Point Way Northeast, Seattle, Washington 98115, USA
| | - Catherine L Berchok
- Marine Mammal Laboratory, AFSC/NMFS/NOAA, 7600 Sand Point Way Northeast, Seattle, Washington 98115, USA
| | - Dana L Wright
- Joint Institute for the Study of the Atmosphere and Oceans, University of Washington, 3737 Brooklyn Avenue Northeast, Seattle, Washington 98195, USA
| | - Arial M Brewer
- Joint Institute for the Study of the Atmosphere and Oceans, University of Washington, 3737 Brooklyn Avenue Northeast, Seattle, Washington 98195, USA
| | - Daniel F Woodrich
- Joint Institute for the Study of the Atmosphere and Oceans, University of Washington, 3737 Brooklyn Avenue Northeast, Seattle, Washington 98195, USA
| |
Collapse
|
27
|
Gong Z, Dong L, Caruso F, Lin M, Liu M, Dong J, Li S. Echolocation signals of free-ranging pantropical spotted dolphins (Stenella attenuata) in the South China Sea. J Acoust Soc Am 2019; 145:3480. [PMID: 31255156 DOI: 10.1121/1.5111742] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Accepted: 05/28/2019] [Indexed: 06/09/2023]
Abstract
Echolocation signals of free-ranging pantropical spotted dolphins (Stenella attenuata) in the western Pacific Ocean have not been studied much. This paper aims to describe the characteristics of echolocation signals of S. attenuata in the northern South China Sea. A six-arm star array with 13 hydrophones was used and a total of 131 on-axis clicks were identified to analyze the acoustic features of the echolocation signals of dolphins. The mean center frequency was 89 ± 13 kHz, with mean peak-to-peak sound source levels of 190 ± 6 dB re: 1 μPa @ 1 m. The mean -3 dB bandwidth and root-mean-square bandwidth were 62 ± 15 kHz and 26 ± 3 kHz, respectively, with mean -10 dB duration of 18 ± 4 μs and root-mean-square duration of 6 ± 2 μs. The results showed that click parameters of S. attenuata in the northern South China Sea are different from those of clicks of the species in Hawaii waters. The differences in click parameters may be due to both behavioral context and/or environmental adaptation of S. attenuata in different habitats.
Collapse
Affiliation(s)
- Zining Gong
- Sanya Key Laboratory of Marine Mammal and Marine Bioacoustics, Institute of Deep-sea Science and Engineering, Chinese Academy of Sciences, Sanya, 572000, China
| | - Lijun Dong
- Sanya Key Laboratory of Marine Mammal and Marine Bioacoustics, Institute of Deep-sea Science and Engineering, Chinese Academy of Sciences, Sanya, 572000, China
| | - Francesco Caruso
- Sanya Key Laboratory of Marine Mammal and Marine Bioacoustics, Institute of Deep-sea Science and Engineering, Chinese Academy of Sciences, Sanya, 572000, China
| | - Mingli Lin
- Sanya Key Laboratory of Marine Mammal and Marine Bioacoustics, Institute of Deep-sea Science and Engineering, Chinese Academy of Sciences, Sanya, 572000, China
| | - Mingming Liu
- Sanya Key Laboratory of Marine Mammal and Marine Bioacoustics, Institute of Deep-sea Science and Engineering, Chinese Academy of Sciences, Sanya, 572000, China
| | - Jianchen Dong
- Sanya Key Laboratory of Marine Mammal and Marine Bioacoustics, Institute of Deep-sea Science and Engineering, Chinese Academy of Sciences, Sanya, 572000, China
| | - Songhai Li
- Sanya Key Laboratory of Marine Mammal and Marine Bioacoustics, Institute of Deep-sea Science and Engineering, Chinese Academy of Sciences, Sanya, 572000, China
| |
Collapse
|
28
|
Smith AB, Pacini AF, Nachtigall PE, Laule GE, Aragones LV, Magno C, Suarez LJA. Transmission beam pattern and dynamics of a spinner dolphin (Stenella longirostris). J Acoust Soc Am 2019; 145:3595. [PMID: 31255135 DOI: 10.1121/1.5111347] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Accepted: 05/24/2019] [Indexed: 06/09/2023]
Abstract
Toothed whales possess a sophisticated biosonar system by which ultrasonic clicks are projected in a highly directional transmission beam. Beam directivity is an important biosonar characteristic that reduces acoustic clutter and increases the acoustic detection range. This study measured click characteristics and the transmission beam pattern from a small odontocete, the spinner dolphin (Stenella longirostis). A formerly stranded individual was rehabilitated and trained to station underwater in front of a 16-element hydrophone array. On-axis clicks showed a mean duration of 20.1 μs, with mean peak and centroid frequencies of 58 and 64 kHz [standard deviation (s.d.) ±30 and ±12 kHz], respectively. Clicks were projected in an oval, vertically compressed beam, with mean vertical and horizontal beamwidths of 14.5° (s.d. ± 3.9) and 16.3° (s.d. ± 4.6), respectively. Directivity indices ranged from 14.9 to 27.4 dB, with a mean of 21.7 dB, although this likely represents a broader beam than what is normally produced by wild individuals. A click subset with characteristics more similar to those described for wild individuals exhibited a mean directivity index of 23.3 dB. Although one of the broadest transmission beams described for a dolphin, it is similar to other small bodied odontocetes.
Collapse
Affiliation(s)
- Adam B Smith
- Marine Mammal Research Program, Hawaii Institute of Marine Biology, University of Hawaii at Manoa, Kaneohe, Hawaii 96744, USA
| | - Aude F Pacini
- Marine Mammal Research Program, Hawaii Institute of Marine Biology, University of Hawaii at Manoa, Kaneohe, Hawaii 96744, USA
| | - Paul E Nachtigall
- Marine Mammal Research Program, Hawaii Institute of Marine Biology, University of Hawaii at Manoa, Kaneohe, Hawaii 96744, USA
| | - Gail E Laule
- Ocean Adventure, Camayan Wharf, West Ilanin Forest, Subic Bay Freeport Zone, Philippines
| | - Lemnuel V Aragones
- Institute of Environmental Science and Meteorology, University of the Philippines, Diliman, Quezon City, Philippines
| | - Carlo Magno
- Ocean Adventure, Camayan Wharf, West Ilanin Forest, Subic Bay Freeport Zone, Philippines
| | - Leo J A Suarez
- Ocean Adventure, Camayan Wharf, West Ilanin Forest, Subic Bay Freeport Zone, Philippines
| |
Collapse
|
29
|
Dong L, Caruso F, Lin M, Liu M, Gong Z, Dong J, Cang S, Li S. Whistles emitted by Indo-Pacific humpback dolphins (Sousa chinensis) in Zhanjiang waters, China. J Acoust Soc Am 2019; 145:3289. [PMID: 31255103 DOI: 10.1121/1.5110304] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Accepted: 05/13/2019] [Indexed: 06/09/2023]
Abstract
Whistles emitted by Indo-Pacific humpback dolphins in Zhanjiang waters, China, were collected by using autonomous acoustic recorders. A total of 529 whistles with clear contours and signal-to-noise ratio higher than 10 dB were extracted for analysis. The fundamental frequencies and durations of analyzed whistles were in ranges of 1785-21 675 Hz and 30-1973 ms, respectively. Six tonal types were identified: constant, downsweep, upsweep, concave, convex, and sine whistles. Constant type was the most dominant tonal type, accounting for 32.51% of all whistles, followed by sine type, accounting for 19.66% of all whistles. This paper examined 17 whistle parameters, which showed significant differences among the six tonal types. Whistles without inflections, gaps, and stairs accounted for 62.6%, 80.6%, and 68.6% of all whistles, respectively. Significant intraspecific differences in all duration and frequency parameters of dolphin whistles were found between this study and the study in Malaysia. Except for start frequency, maximum frequency and the number of harmonics, all whistle parameters showed significant differences between this study and the study conducted in Sanniang Bay, China. The intraspecific differences in vocalizations for this species may be related to macro-geographic and/or environmental variations among waters, suggesting a potential geographic isolation among populations of Indo-Pacific humpback dolphins.
Collapse
Affiliation(s)
- Lijun Dong
- Marine Mammal and Marine Bioacoustics Laboratory, Institute of Deep-Sea Science and Engineering, Chinese Academy of Sciences, Sanya, 572000, China
| | - Francesco Caruso
- Marine Mammal and Marine Bioacoustics Laboratory, Institute of Deep-Sea Science and Engineering, Chinese Academy of Sciences, Sanya, 572000, China
| | - Mingli Lin
- Marine Mammal and Marine Bioacoustics Laboratory, Institute of Deep-Sea Science and Engineering, Chinese Academy of Sciences, Sanya, 572000, China
| | - Mingming Liu
- Marine Mammal and Marine Bioacoustics Laboratory, Institute of Deep-Sea Science and Engineering, Chinese Academy of Sciences, Sanya, 572000, China
| | - Zining Gong
- Marine Mammal and Marine Bioacoustics Laboratory, Institute of Deep-Sea Science and Engineering, Chinese Academy of Sciences, Sanya, 572000, China
| | - Jianchen Dong
- Marine Mammal and Marine Bioacoustics Laboratory, Institute of Deep-Sea Science and Engineering, Chinese Academy of Sciences, Sanya, 572000, China
| | - Siyuan Cang
- Marine Mammal and Marine Bioacoustics Laboratory, Institute of Deep-Sea Science and Engineering, Chinese Academy of Sciences, Sanya, 572000, China
| | - Songhai Li
- Marine Mammal and Marine Bioacoustics Laboratory, Institute of Deep-Sea Science and Engineering, Chinese Academy of Sciences, Sanya, 572000, China
| |
Collapse
|
30
|
Tremblay CJ, Van Parijs SM, Cholewiak D. 50 to 30-Hz triplet and singlet down sweep vocalizations produced by sei whales (Balaenoptera borealis) in the western North Atlantic Ocean. J Acoust Soc Am 2019; 145:3351. [PMID: 31255163 DOI: 10.1121/1.5110713] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
The life history, distribution, and acoustic ecology of the sei whale (Balaenoptera borealis) in the western North Atlantic Ocean remains poorly understood. In this study an array of bottom-mounted recorders captured previously undocumented low frequency 50 to 30-Hz triplet and singlet down sweep vocalizations in close association with signature 82 to 34-Hz sei whale down sweep vocalizations. Spatiotemporal correlations of acoustically tracked sei whales confirm the original vocalizations are produced by sei whales. The 50 to 34-Hz down sweep call types were characterized with a suite of five spectral and temporal measurements. The pattern and repetition of the full acoustic suite is suggestive of song structure and warrants further investigation. The discovery of vocalizations attributed specifically to sei whales enables historic acoustic records to be re-evaluated for the presence of this species throughout its range.
Collapse
Affiliation(s)
| | - Sofie M Van Parijs
- Northeast Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, Woods Hole, Massachusetts 02543, USA
| | - Danielle Cholewiak
- Northeast Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, Woods Hole, Massachusetts 02543, USA
| |
Collapse
|
31
|
Murton O, Shattuck-Hufnagel S, Choi JY, Mehta DD. Identifying a creak probability threshold for an irregular pitch period detection algorithm. J Acoust Soc Am 2019; 145:EL379. [PMID: 31153305 PMCID: PMC6520096 DOI: 10.1121/1.5100911] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Accepted: 04/19/2019] [Indexed: 06/09/2023]
Abstract
Irregular pitch periods (IPPs) are associated with grammatically, pragmatically, and clinically significant types of nonmodal phonation, but are challenging to identify. Automatic detection of IPPs is desirable because accurately hand-identifying IPPs is time-consuming and requires training. The authors evaluated an algorithm developed for creaky voice analysis to automatically identify IPPs in recordings of American English conversational speech. To determine a perceptually relevant threshold probability, frame-by-frame creak probabilities were compared to hand labels, yielding a threshold of approximately 0.02. These results indicate a generally good agreement between hand-labeled IPPs and automatic detection, calling for future work investigating effects of linguistic and prosodic context.
Collapse
Affiliation(s)
- Olivia Murton
- Speech and Hearing Bioscience & Technology, Division of Medical Sciences, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Stefanie Shattuck-Hufnagel
- Speech Communication Group, Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, , , ,
| | - Jeung-Yoon Choi
- Speech Communication Group, Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, , , ,
| | - Daryush D Mehta
- Speech and Hearing Bioscience & Technology, Division of Medical Sciences, Harvard Medical School, Boston, Massachusetts 02115, USA
| |
Collapse
|
32
|
Niu FQ, Yang YM, Zhou ZM, Wang XY, Monanunsap S, Junchompoo C. Echolocation clicks of free-ranging Irrawaddy dolphins (Orcaella brevirostris) in Trat Bay, the eastern Gulf of Thailand. J Acoust Soc Am 2019; 145:3031. [PMID: 31153324 DOI: 10.1121/1.5100619] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Accepted: 04/11/2019] [Indexed: 06/09/2023]
Abstract
Little is known about the vocalizations of Irrawaddy dolphins (Orcaella brevirostris) in the Gulf of Thailand. The present study first described the echolocation clicks of Irrawaddy dolphins in Trat Bay, in the eastern Gulf of Thailand, using a broadband hydrophone system. Over 2 h of acoustic recordings were collected during 14-day study periods in December 2017 and December 2018. Several criteria were used to judge if a click was on axis or as close to the acoustic axis as possible. To calculate the distance of dolphins, a low-budget localization method based on arrival time differences between the direct and indirect signals was used in the present study. The clicks had a mean peak-to-peak source level of 192 ± 3 dB re 1 μPa, an energy flux density source level of 131 ± 3 dB re 1 μPa2s, a mean centroid frequency of 98 ± 10 kHz, a mean duration of 16 ± 2 μs, and a -3 dB bandwidth of 79 ± 13 kHz. The click parameters of the Irrawaddy dolphins in Trat Bay were slightly different from the clicks recorded from the dolphins in Sundarbans, Bangladesh. The present study provided a basic description of the click characteristics of Irrawaddy dolphins in Trat Bay, which could contribute to the management and conservation strategies for local Irrawaddy dolphins, and a basic reference for the proper input parameters in passive acoustic monitoring and detection.
Collapse
Affiliation(s)
- Fu Qiang Niu
- Third Institute of Oceanography, Ministry of Natural Resources, 361005, Xiamen, China
| | - Yan Ming Yang
- Third Institute of Oceanography, Ministry of Natural Resources, 361005, Xiamen, China
| | - Zai Ming Zhou
- Third Institute of Oceanography, Ministry of Natural Resources, 361005, Xiamen, China
| | - Xian Yan Wang
- Third Institute of Oceanography, Ministry of Natural Resources, 361005, Xiamen, China
| | - Somchai Monanunsap
- Marine and Coastal Resources Research and Development Center, the Eastern Gulf of Thailand, 21170, Rayong, Thailand
| | - Chalatip Junchompoo
- Marine and Coastal Resources Research and Development Center, the Eastern Gulf of Thailand, 21170, Rayong, Thailand
| |
Collapse
|
33
|
Chen WR, Whalen DH, Shadle CH. F0-induced formant measurement errors result in biased variabilities. J Acoust Soc Am 2019; 145:EL360. [PMID: 31153348 PMCID: PMC6909981 DOI: 10.1121/1.5103195] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Revised: 04/18/2019] [Accepted: 04/22/2019] [Indexed: 05/21/2023]
Abstract
Many developmental studies attribute reduction of acoustic variability to increasing motor control. However, linear prediction-based formant measurements are known to be biased toward the nearest harmonic of F0, especially at high F0s. Thus, the amount of reported formant variability generated by changes in F0 is unknown. Here, 470 000 vowels were synthesized, mimicking statistics reported in four developmental studies, to estimate the proportion of formant variability that can be attributed to F0 bias, as well as other formant measurement errors. Results showed that the F0-induced formant measurements errors are large and systematic, and cannot be eliminated by a large sample size.
Collapse
Affiliation(s)
- Wei-Rong Chen
- Haskins Laboratories, 300 George Street, New Haven, Connecticut 06511, , ,
| | - D H Whalen
- Haskins Laboratories, 300 George Street, New Haven, Connecticut 06511, , ,
| | - Christine H Shadle
- Haskins Laboratories, 300 George Street, New Haven, Connecticut 06511, , ,
| |
Collapse
|
34
|
Stöber U, Thomsen F. Effect of impact pile driving noise on marine mammals: A comparison of different noise exposure criteria. J Acoust Soc Am 2019; 145:3252. [PMID: 31153340 DOI: 10.1121/1.5109387] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Accepted: 05/08/2019] [Indexed: 06/09/2023]
Abstract
Regulators in Europe and in the United States have developed sound exposure criteria. Criteria range from broadband levels to frequency weighted received sound levels. The associated differences in impact assessment results are, however, not yet understood. This uncertainty makes environmental management of transboundary anthropogenic noise challenging and causes confusion for regulators who need to choose appropriate exposure criteria. In the present study, three established exposure criteria frameworks from Germany, Denmark, and the US were used to analyse the effect of impact pile driving at a location in the Baltic Sea on harbor porpoise and harbor seal hearing. The acoustic modeling using MIKE showed that an unmitigated scenario would lead to auditory injury for all three criteria. Despite readily apparent variances in impact ranges among the applied approaches, it was also evident that noise mitigation measures could reduce underwater sound to levels where auditory injuries would be unlikely in most cases. It was concluded that each of the frameworks has its own advantages and disadvantages. Single noise exposure criteria follow the precautionary principle and can be enforced relatively easily, whereas criteria that consider hearing capabilities and animal response movement can improve the accuracy of the assessment if data are available.
Collapse
Affiliation(s)
- Uwe Stöber
- DHI WASY GmbH, Volmerstraße 8, 12489 Berlin, Germany
| | | |
Collapse
|
35
|
Magnúsdóttir EE, Lim R. Subarctic singers: Humpback whale (Megaptera novaeangliae) song structure and progression from an Icelandic feeding ground during winter. PLoS One 2019; 14:e0210057. [PMID: 30673737 PMCID: PMC6343865 DOI: 10.1371/journal.pone.0210057] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2017] [Accepted: 12/05/2018] [Indexed: 12/03/2022] Open
Abstract
Humpback whale songs associated with breeding behaviors are increasingly reported outside of traditional low latitude breeding grounds. Songs from a subarctic feeding ground during the winter were quantitatively characterized to investigate the structure and temporal changes of the songs at such an atypical location. Recordings were collected from 26. January to 12. March, 2011, using bottom mounted recorders. Humpback songs were detected on 91% of the recording days with peak singing activities during 9.–26. February. The majority of the recordings included multiple chorusing singers. The songs were characterized by a) common static themes which transitioned consistently to predictable themes, b) shifting themes which occurred less predictably and c) rare themes. A set median sequence was found for four different periods (sets) of recordings (approximately 1 week each). The set medians were highly similar and formed a single cluster indicating that the sequences of themes sung in this area belonged to a single cluster of songs despite of the variation caused by the shifting themes. These subarctic winter songs could, thus, represent a characteristic song type for this area which is comparable to extensively studied songs from traditional low latitude breeding grounds. An increase in the number of themes per sequence was observed throughout the recording period including minor changes in the application of themes in the songs; indicating a gradual song progression. The results confirm that continual singing of sophisticated songs occur during the breeding season in the subarctic. In addition to being a well-established summer feeding ground the study area appears to be an important overwintering site for humpback whales delaying or canceling their migration where males engage in active sexual displays, i.e. singing. Importantly, such singing activity on a shared feeding ground likely aids the cultural transmission of songs in the North Atlantic.
Collapse
Affiliation(s)
- Edda E. Magnúsdóttir
- The University of Iceland’s Research Center in Húsavík, Húsavík, Iceland
- Department of Life and Environmental Sciences, University of Iceland, Reykjavík, Iceland
- * E-mail:
| | - Rangyn Lim
- The University of Iceland’s Research Center in Húsavík, Húsavík, Iceland
- Department of Life and Environmental Sciences, University of Iceland, Reykjavík, Iceland
| |
Collapse
|
36
|
Monge-Alvarez J, Hoyos-Barcelo C, San-Jose-Revuelta LM, Casaseca-de-la-Higuera P. A Machine Hearing System for Robust Cough Detection Based on a High-Level Representation of Band-Specific Audio Features. IEEE Trans Biomed Eng 2018; 66:2319-2330. [PMID: 30575527 DOI: 10.1109/tbme.2018.2888998] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Cough is a protective reflex conveying information on the state of the respiratory system. Cough assessment has been limited so far to subjective measurement tools or uncomfortable (i.e., non-wearable) cough monitors. This limits the potential of real-time cough monitoring to improve respiratory care. OBJECTIVE This paper presents a machine hearing system for audio-based robust cough segmentation that can be easily deployed in mobile scenarios. METHODS Cough detection is performed in two steps. First, a short-term spectral feature set is separately computed in five predefined frequency bands: [0, 0.5), [0.5, 1), [1, 1.5), [1.5, 2), and [2, 5.5125] kHz. Feature selection and combination are then applied to make the short-term feature set robust enough in different noisy scenarios. Second, high-level data representation is achieved by computing the mean and standard deviation of short-term descriptors in 300 ms long-term frames. Finally, cough detection is carried out using a support vector machine trained with data from different noisy scenarios. The system is evaluated using a patient signal database which emulates three real-life scenarios in terms of noise content. RESULTS The system achieves 92.71% sensitivity, 88.58% specificity, and 90.69% Area Under Receiver Operating Charcteristic (ROC) curve (AUC), outperforming state-of-the-art methods. CONCLUSION Our research outcome paves the way to create a device for cough monitoring in real-life situations. SIGNIFICANCE Our proposal is aligned with a more comfortable and less disruptive patient monitoring, with benefits for patients (allows self-monitoring of cough symptoms), practitioners (e.g., assessment of treatments or better clinical understanding of cough patterns), and national health systems (by reducing hospitalizations).
Collapse
|
37
|
Blair BD, Brindley S, Hughes J, Dinkeloo E, McKenzie LM, Adgate JL. Measuring environmental noise from airports, oil and gas operations, and traffic with smartphone applications: laboratory and field trials. J Expo Sci Environ Epidemiol 2018; 28:548-558. [PMID: 30283068 DOI: 10.1038/s41370-018-0077-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/09/2018] [Revised: 07/27/2018] [Accepted: 09/06/2018] [Indexed: 06/08/2023]
Abstract
Environmental noise from sources such as traffic, airports, and oil and gas (O&G) operations is associated with nuisance and health concerns. Smartphones with external microphones have been recommended for environmental noise monitoring and may be useful tools for citizen science, but are not validated against reference methods. We evaluated laboratory performance of three smartphone/application (app) configurations recommended for environmental noise measurement. Two smartphone/app configurations were also compared to a reference sampler, a type 1 sound level meter (SLM) at ten outdoor sites with traffic, airport, and O&G noise. To evaluate performance, we compared the mean squared error, variance, bias, and Krippendorff's Alpha by smartphone/app combination and testing location for both audible (A-weighted) and low-frequency (C-weighted) noise. We observed that laboratory measurements were in strong agreement with a reference sampler. The field A-weighted noise level results had strong agreement with the SLM at several outdoor sites, but our C-weighted noise results ranged from moderate to substantial agreement. For our tested configurations, we find that smartphones with external microphones are reliable proxies for measuring A- and C-weighted noise in a laboratory setting. Outdoor performance depends on noise source type, weighting, and precision and accuracy needs of the investigation.
Collapse
Affiliation(s)
- Benjamin D Blair
- Colorado School of Public Health, University of Colorado Denver, Aurora, CO, USA
| | - Stephen Brindley
- Colorado School of Public Health, University of Colorado Denver, Aurora, CO, USA
| | - John Hughes
- Colorado School of Public Health, University of Colorado Denver, Aurora, CO, USA
| | - Eero Dinkeloo
- Colorado School of Public Health, University of Colorado Denver, Aurora, CO, USA
| | | | - John L Adgate
- Colorado School of Public Health, University of Colorado Denver, Aurora, CO, USA.
| |
Collapse
|
38
|
Leunissen EM, Webster T, Rayment W. Characteristics of vocalisations recorded from free-ranging Shepherd's beaked whales, Tasmacetus shepherdi. J Acoust Soc Am 2018; 144:2701. [PMID: 30522329 DOI: 10.1121/1.5067380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Accepted: 10/15/2018] [Indexed: 06/09/2023]
Abstract
Beaked whales (family Ziphiidae) are among the least studied of all the large mammals. This is especially true of Shepherd's beaked whale (Tasmacetus shepherdi), which until recently had been very rarely sighted alive, with nothing known about the species' acoustic behaviour. Vocalisations of Shepherd's beaked whales were recorded using a hydrophone array on two separate days during marine mammal surveys of the Otago submarine canyons in New Zealand. After carefully screening the recordings, two distinct call types were found; broadband echolocation clicks, and burst pulses. Broadband echolocation clicks (n = 476) had a median inter-click-interval (ICI) of 0.46 s and median peak frequency of 19.2 kHz. The burst pulses (n = 33) had a median peak frequency of constituent clicks (n = 1741) of 14.7 kHz, and median ICI of 11 ms. These results should be interpreted with caution due to the limited bandwidth used to record the signals. To the authors' knowledge, this study presents the first analysis of the characteristics of Shepherd's beaked whale sounds. It will help with identification of the species in passive acoustic monitoring records, and future efforts to further analyse this species' vocalisations.
Collapse
Affiliation(s)
- Eva M Leunissen
- Marine Science Department, University of Otago, P.O. Box 56, Dunedin 9016, New Zealand
| | - Trudi Webster
- Marine Science Department, University of Otago, P.O. Box 56, Dunedin 9016, New Zealand
| | - William Rayment
- Marine Science Department, University of Otago, P.O. Box 56, Dunedin 9016, New Zealand
| |
Collapse
|
39
|
Kokabi O, Brinkmann F, Weinzierl S. Segmentation of binaural room impulse responses for speech intelligibility prediction. J Acoust Soc Am 2018; 144:2793. [PMID: 30522312 DOI: 10.1121/1.5078598] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2018] [Accepted: 10/24/2018] [Indexed: 06/09/2023]
Abstract
The two most important aspects in binaural speech perception-better-ear-listening and spatial-release-from-masking-can be predicted well with current binaural modeling frameworks operating on head-related impulse responses, i.e., anechoic binaural signals. To incorporate effects of reverberation, a model extension was proposed, splitting binaural room impulse responses into an early, useful, and late, detrimental part, before being fed into the modeling framework. More recently, an interaction between the applied splitting time, room properties, and the resulting prediction accuracy was observed. This interaction was investigated here by measuring speech reception thresholds (SRTs) in quiet with 18 normal-hearing subjects for four simulated rooms with different reverberation times and a constant room geometry. The mean error with one of the most promising binaural prediction models could be reduced by about 1 dB by adapting the applied splitting time to room acoustic parameters. This improvement in prediction accuracy can make up a difference of 17% in absolute intelligibility within the applied SRT measurement paradigm.
Collapse
Affiliation(s)
- Omid Kokabi
- TU Berlin, Audio Communication Group, Einsteinufer 17c, 10587 Berlin, Germany
| | - Fabian Brinkmann
- TU Berlin, Audio Communication Group, Einsteinufer 17c, 10587 Berlin, Germany
| | - Stefan Weinzierl
- TU Berlin, Audio Communication Group, Einsteinufer 17c, 10587 Berlin, Germany
| |
Collapse
|
40
|
DeAngelis AI, Stanistreet JE, Baumann-Pickering S, Cholewiak DM. A description of echolocation clicks recorded in the presence of True's beaked whale ( Mesoplodon mirus). J Acoust Soc Am 2018; 144:2691. [PMID: 30522279 DOI: 10.1121/1.5067379] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2018] [Accepted: 10/15/2018] [Indexed: 06/09/2023]
Abstract
True's beaked whales (Mesoplodon mirus) were encountered on two separate shipboard surveys on 24 July 2016 and 16 September 2017 in the western North Atlantic Ocean. Recordings were made using a hydrophone array towed 300 m behind the ship. In 2016, three different groups were sighted within 1500 m of the ship; clicks were recorded for 26 min. In 2017, a single group of five whales was tracked over the course of five hours in which the ship maintained a distance <4000 m from the group. A total of 2938 frequency-modulated (FM) clicks and 7 buzzes were recorded from both encounters. Plausible inter-click-intervals (ICIs) were calculated from 2763 clicks, and frequency and duration measurements were calculated from 2150 good quality FM clicks. The median peak frequencies were 43.1 kHz (2016, n = 718) and 43.5 kHz (2017, n = 1432). Median ICIs were 0.17 s (2016) and 0.19 s (2017). The spectra and measurements of the recorded clicks closely resemble Gervais's beaked whale clicks (Mesoplodon europaeus) and distinguishing between the two species in acoustic data sets proves difficult. The acoustic behavior of True's beaked whales was previously unknown; this study provides a description of echolocation clicks produced by this species.
Collapse
Affiliation(s)
- Annamaria Izzi DeAngelis
- Integrated Statistics, under contract to the Northeast Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration (NOAA), 166 Water Street, Woods Hole, Massachusetts 02543, USA
| | - Joy E Stanistreet
- Fisheries and Oceans Canada, Bedford Institute of Oceanography, 1 Challenger Drive, Dartmouth, Nova Scotia, B2Y 4A2, Canada
| | - Simone Baumann-Pickering
- Scripps Institution of Oceanography, University of California San Diego, 9500 Gilman Drive, La Jolla, California 92093-0205, USA
| | - Danielle M Cholewiak
- Northeast Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration (NOAA), 166 Water Street, Woods Hole, Massachusetts 02543, USA
| |
Collapse
|
41
|
Harakawa R, Ogawa T, Haseyama M, Akamatsu T. Automatic detection of fish sounds based on multi-stage classification including logistic regression via adaptive feature weighting. J Acoust Soc Am 2018; 144:2709. [PMID: 30522274 DOI: 10.1121/1.5067373] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Accepted: 10/16/2018] [Indexed: 06/09/2023]
Abstract
This paper presents a method for automatic detection of fish sounds in an underwater environment. There exist two difficulties: (i) features and classifiers that provide good detection results differ depending on the underwater environment and (ii) there are cases where a large amount of training data that is necessary for supervised machine learning cannot be prepared. A method presented in this paper (the proposed hybrid method) overcomes these difficulties as follows. First, novel logistic regression (NLR) is derived via adaptive feature weighting by focusing on the accuracy of classification results by multiple classifiers, support vector machine (SVM), and k-nearest neighbors (k-NN). Although there are cases where SVM or k-NN cannot work well due to divergence of useful features, NLR can produce complementary results. Second, the proposed hybrid method performs multi-stage classification with consideration of the accuracy of SVM, k-NN, and NLR. The multi-stage acquisition of reliable results works adaptively according to the underwater environment to reduce performance degradation due to diversity of useful classifiers even if abundant training data cannot be prepared. Experiments on underwater recordings including sounds of Sciaenidae such as silver croakers (Pennahia argentata) and blue drums (Nibea mitsukurii) show the effectiveness of the proposed hybrid method.
Collapse
Affiliation(s)
- Ryosuke Harakawa
- Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Hokkaido 060-0814, Japan
| | - Takahiro Ogawa
- Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Hokkaido 060-0814, Japan
| | - Miki Haseyama
- Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Hokkaido 060-0814, Japan
| | - Tomonari Akamatsu
- National Research Institute of Fisheries Science, Fisheries Research Agency, Yokohama, Kanagawa 236-8648, Japan
| |
Collapse
|
42
|
Yanushevskaya I, Gobl C, Ní Chasaide A. Cross-language differences in how voice quality and f 0 contours map to affect. J Acoust Soc Am 2018; 144:2730. [PMID: 30522326 DOI: 10.1121/1.5066448] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2018] [Accepted: 10/12/2018] [Indexed: 06/09/2023]
Abstract
The relationship between prosody and perceived affect involves multiple variables. This paper explores the interplay of three: voice quality, f 0 contour, and the hearer's language background. Perception tests were conducted with speakers of Irish English, Russian, Spanish, and Japanese using three types of synthetic stimuli: (1) stimuli varied in voice quality, (2) stimuli of uniform (modal) voice quality incorporating affect-related f 0 contours, and (3) stimuli combining specific non-modal voice qualities with the affect-related f 0 contours of (2). The participants rated the stimuli for the presence/strength of affective colouring on six bipolar scales, e.g., happy-sad. The results suggest that stimuli incorporating non-modal voice qualities, with or without f 0 variation, are generally more effective in affect cueing than stimuli varying only in f 0. Along with similarities in the affective responses across these languages, many points of divergence were found, both in terms of the range and strength of affective responses overall and in terms of specific stimulus-to-affect associations. The f 0 contour may play a more important role, and tense voice a lesser role in affect signalling in Japanese and Spanish than in Irish English and Russian. The greatest cross-language differences emerged for the affects intimate, formal, stressed, and relaxed.
Collapse
Affiliation(s)
- Irena Yanushevskaya
- Phonetics and Speech Laboratory, School of Linguistic, Speech and Communication Sciences, Trinity College Dublin, Dublin, Ireland
| | - Christer Gobl
- Phonetics and Speech Laboratory, School of Linguistic, Speech and Communication Sciences, Trinity College Dublin, Dublin, Ireland
| | - Ailbhe Ní Chasaide
- Phonetics and Speech Laboratory, School of Linguistic, Speech and Communication Sciences, Trinity College Dublin, Dublin, Ireland
| |
Collapse
|
43
|
Kumar SP, Švec JG. A Simple Method to Obtain Basic Acoustic Measures From Video Recordings as Subtitles. J Speech Lang Hear Res 2018; 61:2196-2204. [PMID: 30167666 DOI: 10.1044/2018_jslhr-s-17-0472] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Accepted: 04/26/2018] [Indexed: 06/08/2023]
Abstract
PURPOSE Sound pressure level (SPL) and fundamental frequency (fo) are very basic and important measures in the acoustical assessment of voice quality, and their variation influences also the vocal fold vibration characteristics. Most sophisticated laryngeal videostroboscopic systems therefore also measure and display the SPL and fo values directly over the video frames by means of a rather expensive special hardware setup. An alternative simple software-based method is presented here to obtain these measures as video subtitles. METHOD The software extracts acoustic data from the video recording, calculates the SPL and fo parameters, and saves their values in a separate subtitle file. To ensure the correct SPL values, the microphone signal is calibrated beforehand with a sound level meter. RESULTS The new approach was tested on videokymographic recordings obtained laryngoscopically. The results of SPL and fo values calculated from the videokymographic recording, subtitles creation, and their display are presented. CONCLUSIONS This method is useful in integrating the acoustic measures with any kind of video recordings containing audio data when inbuilt hardware means are not available. However, calibration and other technical aspects related to data acquisition and synchronization described in this article should be properly taken care of during the recording.
Collapse
Affiliation(s)
- S Pravin Kumar
- Faculty of Science, Department of Biophysics, Voice Research Lab, Palacký University, Olomouc, Czech Republic
| | - Jan G Švec
- Faculty of Science, Department of Biophysics, Voice Research Lab, Palacký University, Olomouc, Czech Republic
| |
Collapse
|
44
|
Healy EW, Vasko JL. An ideal quantized mask to increase intelligibility and quality of speech in noise. J Acoust Soc Am 2018; 144:1392. [PMID: 30424638 PMCID: PMC6136922 DOI: 10.1121/1.5053115] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Revised: 08/06/2018] [Accepted: 08/20/2018] [Indexed: 05/25/2023]
Abstract
Time-frequency (T-F) masks represent powerful tools to increase the intelligibility of speech in background noise. Translational relevance is provided by their accurate estimation based only on the signal-plus-noise mixture, using deep learning or other machine-learning techniques. In the current study, a technique is designed to capture the benefits of existing techniques. In the ideal quantized mask (IQM), speech and noise are partitioned into T-F units, and each unit receives one of N attenuations according to its signal-to-noise ratio. It was found that as few as four to eight attenuation steps (IQM4, IQM8) improved intelligibility over the ideal binary mask (IBM, having two attenuation steps), and equaled the intelligibility resulting from the ideal ratio mask (IRM, having a theoretically infinite number of steps). Sound-quality ratings and rankings of noisy speech processed by the IQM4 and IQM8 were also superior to that processed by the IBM and equaled or exceeded that processed by the IRM. It is concluded that the intelligibility and sound-quality advantages of infinite attenuation resolution can be captured by an IQM having only a very small number of steps. Further, the classification-based nature of the IQM might provide algorithmic advantages over the regression-based IRM during machine estimation.
Collapse
Affiliation(s)
- Eric W Healy
- Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| | - Jordan L Vasko
- Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| |
Collapse
|
45
|
Hillier AF, Hillier CE, Hillier DA. A modified spectrogram with possible application as a visual hearing aid for the deaf. J Acoust Soc Am 2018; 144:1517. [PMID: 30424670 DOI: 10.1121/1.5055224] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Accepted: 09/06/2018] [Indexed: 06/09/2023]
Abstract
A modified spectrogram was developed, substantially improving visual word recognition over that of traditional spectrograms from 23% to 80%. Traditional spectrograms are difficult to interpret quickly due partly to poor contrast, subtle cues, and extraneous detail. Improvements developed here include increased frequency resolution, enhancement of inconspicuous but relevant information, and elimination of extraneous detail. Log-frequency and especially, sone-amplitude scaling were subjectively easier to interpret visually than linear-frequency, dB-amplitude, and linear-amplitude scaling. The spectrogram was made sufficiently small to fit into the center of vision, emulating written language in which individual words are recognized as discrete patterns.
Collapse
Affiliation(s)
- Adeline F Hillier
- Newport High School, 4333 Factoria Boulevard Southeast, Bellevue, Washington 98006, USA
| | - Claire E Hillier
- Newport High School, 4333 Factoria Boulevard Southeast, Bellevue, Washington 98006, USA
| | - David A Hillier
- Newport High School, 4333 Factoria Boulevard Southeast, Bellevue, Washington 98006, USA
| |
Collapse
|
46
|
Friberg A, Lindeberg T, Hellwagner M, Helgason P, Salomão GL, Elowsson A, Lemaitre G, Ternström S. Prediction of three articulatory categories in vocal sound imitations using models for auditory receptive fields. J Acoust Soc Am 2018; 144:1467. [PMID: 30424637 DOI: 10.1121/1.5052438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2018] [Accepted: 08/16/2018] [Indexed: 06/09/2023]
Abstract
Vocal sound imitations provide a new challenge for understanding the coupling between articulatory mechanisms and the resulting audio. In this study, the classification of three articulatory categories, phonation, supraglottal myoelastic vibrations, and turbulence, have been modeled from audio recordings. Two data sets were assembled, consisting of different vocal imitations by four professional imitators and four non-professional speakers in two different experiments. The audio data were manually annotated by two experienced phoneticians using a detailed articulatory description scheme. A separate set of audio features was developed specifically for each category using both time-domain and spectral methods. For all time-frequency transformations, and for some secondary processing, the recently developed Auditory Receptive Fields Toolbox was used. Three different machine learning methods were applied for predicting the final articulatory categories. The result with the best generalization was found using an ensemble of multilayer perceptrons. The cross-validated classification accuracy was 96.8% for phonation, 90.8% for supraglottal myoelastic vibrations, and 89.0% for turbulence using all the 84 developed features. A final feature reduction to 22 features yielded similar results.
Collapse
Affiliation(s)
- Anders Friberg
- Speech, Music and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Lindstedtsvägen 24, 10044 Stockholm, Sweden
| | - Tony Lindeberg
- Computational Brain Science Lab, Computational Science and Technology, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Lindstedtsvägen 5, 10044 Stockholm, Sweden
| | - Martin Hellwagner
- Speech, Music and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Lindstedtsvägen 24, 10044 Stockholm, Sweden
| | - Pétur Helgason
- Speech, Music and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Lindstedtsvägen 24, 10044 Stockholm, Sweden
| | - Gláucia Laís Salomão
- Speech, Music and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Lindstedtsvägen 24, 10044 Stockholm, Sweden
| | - Anders Elowsson
- Speech, Music and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Lindstedtsvägen 24, 10044 Stockholm, Sweden
| | - Guillaume Lemaitre
- Institute for Research and Coordination in Acoustics and Music, 1 Place Igor Stravinsky, Paris 75004, France
| | - Sten Ternström
- Speech, Music and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Lindstedtsvägen 24, 10044 Stockholm, Sweden
| |
Collapse
|
47
|
Abstract
OBJECTIVE In this paper, we accurately detect the state-sequence first heart sound (S1)-systole-second heart sound (S2)-diastole, i.e., the positions of S1 and S2, in heart sound recordings. We propose an event detection approach without explicitly incorporating a priori information of the state duration. This renders it also applicable to recordings with cardiac arrhythmia and extendable to the detection of extra heart sounds (third and fourth heart sound), heart murmurs, as well as other acoustic events. METHODS We use data from the 2016 PhysioNet/CinC Challenge, containing heart sound recordings and annotations of the heart sound states. From the recordings, we extract spectral and envelope features and investigate the performance of different deep recurrent neural network (DRNN) architectures to detect the state sequence. We use virtual adversarial training, dropout, and data augmentation for regularization. RESULTS We compare our results with the state-of-the-art method and achieve an average score for the four events of the state sequence of ${\bf F}_{1}\approx 96$% on an independent test set. CONCLUSION Our approach shows state-of-the-art performance carefully evaluated on the 2016 PhysioNet/CinC Challenge dataset. SIGNIFICANCE In this work, we introduce a new methodology for the segmentation of heart sounds, suggesting an event detection approach with DRNNs using spectral or envelope features.
Collapse
|
48
|
Brajot FX, Lawrence D. Delay-induced low-frequency modulation of the voice during sustained phonation. J Acoust Soc Am 2018; 144:282. [PMID: 30075671 DOI: 10.1121/1.5046092] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2018] [Accepted: 06/25/2018] [Indexed: 06/08/2023]
Abstract
An important property of negative feedback systems is the tendency to oscillate when feedback is delayed. This paper evaluated this phenomenon in a sustained phonation task, where subjects prolonged a vowel with 0-600 ms delays in auditory feedback. This resulted in a delay-dependent vocal wow: from 0.4 to 1 Hz fluctuations in fundamental frequency and intensity that increased in period and amplitude as the delay increased. A similar modulation in low-frequency oscillations was not observed in the first two formant frequencies, although some subjects did display increased variability. Results suggest that delayed auditory feedback enhances an existing periodic fluctuation in the voice, with a more complex, possibly indirect, influence on supraglottal articulation. These findings have important implications for understanding how speech may be affected by artificially applied or disease-based delays in sensory feedback.
Collapse
Affiliation(s)
- François-Xavier Brajot
- Communication Sciences and Disorders, Ohio University, Grover Center W221, Athens, Ohio 45701, USA
| | - Douglas Lawrence
- Electrical Engineering and Computer Science, Ohio University, Stocker Center 347, Athens, Ohio 45701, USA
| |
Collapse
|
49
|
Bøttcher A, Gero S, Beedholm K, Whitehead H, Madsen PT. Variability of the inter-pulse interval in sperm whale clicks with implications for size estimation and individual identification. J Acoust Soc Am 2018; 144:365. [PMID: 30075661 DOI: 10.1121/1.5047657] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Accepted: 07/05/2018] [Indexed: 06/08/2023]
Abstract
Sperm whales generate multi-pulsed clicks for echolocation and communication with an inter-pulse interval (IPI) determined by the size of their hypertrophied sound producing nose. The IPI has therefore been used to estimate body size and distinguish between individuals, and it has been hypothesized that conspecifics may use IPIs to recognize each other. However, the degree to which IPIs vary within individuals has not explicitly been tested, and therefore the inherent precision of this measure and its applicability for size estimation for researchers and sperm whales alike remain unknown. Here, the variability in IPI from both animal-borne Dtags and far-field recordings from echolocating and communicating sperm whales is quantified. Three different automatic methods (envelope, cepstrum, and cross-correlation) are tested and it is found that the envelope approach results in the least dispersion. Furthermore, it is shown that neither growth, depth, nor recording aspect fully explains the observed variability among clicks recorded from the same individual. It is proposed that dynamics in the soft structures of the nose are affecting IPIs, resulting in a variation of approximately 0.2 ms. Therefore, it is recommended that this variation be considered in IPI studies and that IPIs may have limited functionality as an identity cue among large groups of conspecifics.
Collapse
Affiliation(s)
- Anne Bøttcher
- Department of Bioscience, Zoophysiology, Aarhus University, Denmark
| | - Shane Gero
- Department of Bioscience, Zoophysiology, Aarhus University, Denmark
| | | | - Hal Whitehead
- Department of Biology, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Peter T Madsen
- Aarhus Institute of Advanced Studies, Høegh-Guldbergs Gade 6B, DK-8000 Aarhus C, Denmark
| |
Collapse
|
50
|
LeBien JG, Ioup JW. Species-level classification of beaked whale echolocation signals detected in the northern Gulf of Mexico. J Acoust Soc Am 2018; 144:387. [PMID: 30075691 DOI: 10.1121/1.5047435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2017] [Accepted: 06/30/2018] [Indexed: 06/08/2023]
Abstract
This study presents and evaluates several methods for automated species-level classification of echolocation clicks from three beaked whale species recorded in the northern Gulf of Mexico. The species included are Cuvier's and Gervais' beaked whales, as well as an unknown species denoted Beaked Whale Gulf. An optimal feature set for discriminating the three click types while also separating detected clicks from unidentified delphinids was determined using supervised step-wise discriminant analysis. Linear and quadratic discriminant analyses both achieved error rates below 1% with three features, determined by tenfold cross validation. The waveform fractal dimension was found to be a highly ranked feature among standard spectral and temporal parameters. The top-ranking features were Higuchi's fractal dimension, spectral centroid, Katz's fractal dimension, and -10 dB duration. Six clustering routines, including four popular network-based algorithms, were also evaluated as unsupervised classification methods using the selected feature set. False positive rates of 0.001 and 0.024 were achieved by Chinese Whispers and spectral clustering, respectively, across 200 randomized trials. However, Chinese Whispers clustering yielded larger false negative rates. Spectral clustering was further tested on clicks from encounters of beaked, sperm, and pilot whales in the Tongue of the Ocean, Bahamas.
Collapse
Affiliation(s)
- Jack G LeBien
- Department of Physics, University of New Orleans, New Orleans, Louisiana 70148, USA
| | - Juliette W Ioup
- Department of Physics, University of New Orleans, New Orleans, Louisiana 70148, USA
| |
Collapse
|