1
|
Ikuma T, McWhorter AJ, Oral E, Kunduk M. Formant-Aware Spectral Analysis of Sustained Vowels of Pathological Breathy Voice. J Voice 2023:S0892-1997(23)00154-6. [PMID: 37302909 DOI: 10.1016/j.jvoice.2023.05.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 05/07/2023] [Accepted: 05/08/2023] [Indexed: 06/13/2023]
Abstract
OBJECTIVES This paper reports the effectiveness of formant-aware spectral parameters to predict the perceptual breathiness rating. A breathy voice has a steeper spectral slope and higher turbulent noise than a normal voice. Measuring spectral parameters of acoustic signals over lower formant regions is a known approach to capture the properties related to breathiness. This study examines this approach by testing the contemporary spectral parameters and algorithms within the framework, alternate frequency band designs, and vowel effects. METHODS Sustained vowel recordings (/a/, /i/, and /u/) of speakers with voice disorders in the German Saarbrueken Voice Database were considered (n: 367). Recordings with signal irregularities, such as subharmonics or with roughness perception, were excluded from the study. Four speech language pathologists perceptually rated the recordings for breathiness on a 100-point scale, and their averages were used in the analysis. The acoustic spectra were segmented into four frequency bands according to the vowel formant structures. Five spectral parameters (intraband harmonics-to-noise ratio, HNR; interband harmonics ratio, HHR; interband noise ratio, NNR; and interband glottal-to-noise energy, GNE, ratio) were evaluated in each band to predict the perceptual breathiness rating. Four HNR algorithms were tested. RESULTS Multiple linear regression models of spectral parameters, led by the HNRs, were shown to explain up to 85% of the variance in perceptual breathiness ratings. This performance exceeded that of the acoustic breathiness index (82%). Individually, the HNR over the first two formants best explained the variances in the breathiness (78%), exceeding the smoothed cepstrum peak prominence (74%). The performance of HNR was highly algorithm dependent (10% spread). Some vowel effects were observed in the perceptual rating (higher for /u/), predictability (5% lower for /u/), and model parameter selections. CONCLUSIONS Strong per-vowel breathiness acoustic models were found by segmenting the spectrum to isolate the portion most affected by breathiness.
Collapse
Affiliation(s)
- Takeshi Ikuma
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, Louisiana; Voice Center, The Our Lady of The Lake Regional Medical Center, Baton Rouge, Louisiana.
| | - Andrew J McWhorter
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, Louisiana; Voice Center, The Our Lady of The Lake Regional Medical Center, Baton Rouge, Louisiana
| | - Evrim Oral
- Biostatistics Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, Louisiana
| | - Melda Kunduk
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, Louisiana; Voice Center, The Our Lady of The Lake Regional Medical Center, Baton Rouge, Louisiana; Dept. of Communication Sciences & Disorders, Louisiana State University, Baton Rouge, Louisiana
| |
Collapse
|
2
|
Ikuma T, Story B, McWhorter AJ, Adkins L, Kunduk M. Harmonics-to-noise ratio estimation with deterministically time-varying harmonic model for pathological voice signals. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:1783. [PMID: 36182331 DOI: 10.1121/10.0014177] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 09/01/2022] [Indexed: 06/16/2023]
Abstract
The harmonics-to-noise ratio (HNR) and other spectral noise parameters are important in clinical objective voice assessment as they could indicate the presence of nonharmonic phenomena, which are tied to the perception of hoarseness or breathiness. Existing HNR estimators are built on the voice signals to be nearly periodic (fixed over a short period), although voice pathology could induce involuntary slow modulation to void this assumption. This paper proposes the use of a deterministically time-varying harmonic model to improve the HNR measurements. To estimate the time-varying model, a two-stage iterative least squares algorithm is proposed to reduce model overfitting. The efficacy of the proposed HNR estimator is demonstrated with synthetic signals, simulated tremor signals, and recorded acoustic signals. Results indicate that the proposed algorithm can produce consistent HNR measures as the extent and rate of tremor are varied.
Collapse
Affiliation(s)
- Takeshi Ikuma
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, Louisiana 70112, USA
| | - Brad Story
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721, USA
| | - Andrew J McWhorter
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, Louisiana 70112, USA
| | - Lacey Adkins
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, Louisiana 70112, USA
| | - Melda Kunduk
- Department of Communication Disorders, Louisiana State University, Baton Rouge, Louisiana 70803, USA
| |
Collapse
|
3
|
Gómez-García J, Moro-Velázquez L, Arias-Londoño J, Godino-Llorente J. On the design of automatic voice condition analysis systems. Part III: review of acoustic modelling strategies. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2020.102049] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
4
|
Mehta DD, Van Stan JH, Hillman RE. Relationships between vocal function measures derived from an acoustic microphone and a subglottal neck-surface accelerometer. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 2016; 24:659-668. [PMID: 27066520 PMCID: PMC4826073 DOI: 10.1109/taslp.2016.2516647] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Monitoring subglottal neck-surface acceleration has received renewed attention due to the ability of low-profile accelerometers to confidentially and noninvasively track properties related to normal and disordered voice characteristics and behavior. This study investigated the ability of subglottal neck-surface acceleration to yield vocal function measures traditionally derived from the acoustic voice signal and help guide the development of clinically functional accelerometer-based measures from a physiological perspective. Results are reported for 82 adult speakers with voice disorders and 52 adult speakers with normal voices who produced the sustained vowels /a/, /i/, and /u/ at a comfortable pitch and loudness during the simultaneous recording of radiated acoustic pressure and subglottal neck-surface acceleration. As expected, timing-related measures of jitter exhibited the strongest correlation between acoustic and neck-surface acceleration waveforms (r ≤ 0.99), whereas amplitude-based measures of shimmer correlated less strongly (r ≤ 0.74). Additionally, weaker correlations were exhibited by spectral measures of harmonics-to-noise ratio (r ≤ 0.69) and tilt (r ≤ 0.57), whereas the cepstral peak prominence correlated more strongly (r ≤ 0.90). These empirical relationships provide evidence to support the use of accelerometers as effective complements to acoustic recordings in the assessment and monitoring of vocal function in the laboratory, clinic, and during an individual's daily activities.
Collapse
Affiliation(s)
- Daryush D Mehta
- Center for Laryngeal Surgery & Voice Rehabilitation, Massachusetts General Hospital, Boston MA 02114 USA, Department of Surgery, Harvard Medical School, Boston, MA 02115 USA, and the Institute of Health Professions, Massachusetts General Hospital, Boston, Massachusetts 02129 USA ( )
| | - Jarrad H Van Stan
- Center for Laryngeal Surgery & Voice Rehabilitation, Massachusetts General Hospital, Boston MA 02114 USA and the Institute of Health Professions, Massachusetts General Hospital, Boston, Massachusetts 02129 USA ( )
| | - Robert E Hillman
- Center for Laryngeal Surgery & Voice Rehabilitation and Institute of Health Professions, Massachusetts General Hospital, Boston MA 02114 USA and Surgery and Health Sciences & Technology, Harvard Medical School, Boston, MA 02115 ( )
| |
Collapse
|
5
|
Kacha A, Grenez F, Schoentgen J. Multiband vocal dysperiodicities analysis using empirical mode decomposition in the log-spectral domain. Biomed Signal Process Control 2015. [DOI: 10.1016/j.bspc.2014.08.011] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
6
|
Drugman T, Alku P, Alwan A, Yegnanarayana B. Glottal source processing: From analysis to applications. COMPUT SPEECH LANG 2014. [DOI: 10.1016/j.csl.2014.03.003] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
7
|
Abstract
A measure of the harmonics-to-noise ratio (HNR) in voice signals is used for assisting in the classification of voice pathologies. HNR estimation for voice signals has been investigated using time domain, cepstral, Fourier series and Fourier transform analyses. The present investigation focuses on methods that use a direct application of the Fourier transform to the signal. Three approaches to obtaining a HNR index from the power spectrum are reviewed. The present study uses synthetic voice signals to provide an unambiguous assessment of methods. The study highlights the fact that even though the indices derived from the power spectrum are useful in separating pathological and normal voice data sets, they provide only indirect information regarding the HNR of the glottal signal.
Collapse
Affiliation(s)
- Peter J Murphy
- Department of Electronic and Computer Engineering, University of Limerick, Limerick, Ireland.
| |
Collapse
|
8
|
|
9
|
Murphy PJ, McGuigan KG, Walsh M, Colreavy M. Investigation of a glottal related harmonics-to-noise ratio and spectral tilt as indicators of glottal noise in synthesized and human voice signals. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 123:1642-52. [PMID: 18345852 DOI: 10.1121/1.2832651] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
The harmonics-to-noise ratio (HNR) of the voiced speech signal has implicitly been used to infer information regarding the turbulent noise level at the glottis. However, two problems exist for inferring glottal noise attributes from the HNR of the speech wave form: (i) the measure is fundamental frequency (f0) dependent for equal levels of glottal noise, and (ii) any deviation from signal periodicity affects the ratio, not just turbulent noise. An alternative harmonics-to-noise ratio formulation [glottal related HNR (GHNR')] is proposed to overcome the former problem. In GHNR' a mean over the spectral range of interest of the HNRs at specific harmonic/between-harmonic frequencies (expressed in linear scale) is calculated. For the latter issue [(ii)] two spectral tilt measures are shown, using synthesis data, to be sensitive to glottal noise while at the same time being comparatively insensitive to other glottal aperiodicities. The theoretical development predicts that the spectral tilt measures reduce as noise levels increase. A conventional HNR estimator, GHNR' and two spectral tilt measures are applied to a data set of 13 pathological and 12 normal voice samples. One of the tilt measures and GHNR' are shown to provide statistically significant differentiating power over a conventional HNR estimator.
Collapse
Affiliation(s)
- Peter J Murphy
- Department of Electronic and Computer Engineering, University of Limerick, Limerick, Ireland.
| | | | | | | |
Collapse
|
10
|
Murphy P. Source-filter Comparison of Measurements of Fundamental Frequency Perturbation and Amplitude Perturbation for Synthesized Voice Signals. J Voice 2008; 22:125-37. [PMID: 17147983 DOI: 10.1016/j.jvoice.2006.09.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2006] [Accepted: 09/21/2006] [Indexed: 11/22/2022]
Abstract
SUMMARY An investigation of the effect of glottal source aperiodicities (jitter, shimmer, and aspiration noise) on the estimation of fundamental frequency (f0) perturbation and amplitude perturbation, of synthesized, glottal source and voiced speech waveforms, is considered. Firstly, 4, cycle-event f0 estimators are examined: (1) waveform matching of the low-pass filtered waveform, (2) positive peaks (PPs) from the speech waveform, (3) PPs from the low-pass filtered waveform, and (4) positive zero crossings from the low-pass filtered waveform. The analysis shows that f0 perturbation measures taken from the low-pass filtered waveform are affected by both amplitude perturbation and random glottal noise, whereas, f0 perturbation measures taken from the PPs of the original waveform are affected by noise but not by amplitude perturbation. It is shown for the low-pass filter methods that the effects of amplitude perturbation and noise lead to increased errors in the measurement of f0 perturbation for the synthesized speech waveforms when compared with the synthesized glottal waveforms. Shimmer of the synthesized speech waveform is approximately equal to shimmer of the synthesized glottal source. However, noise and jitter affect measures of amplitude perturbation. The estimation of f0 perturbation from the synthesized speech waveform is shown to be nonlinearly related to f0 perturbation estimation from the synthesized glottal waveform as a consequence of the filtering action of the vocal tract. Low-pass filtering the voiced speech waveform is shown to provide a partial solution to this problem.
Collapse
Affiliation(s)
- Peter Murphy
- Department of Electronic and Computer Engineering, University of Limerick, Limerick, Ireland.
| |
Collapse
|
11
|
Murphy PJ, Akande OO. Noise estimation in voice signals using short-term cepstral analysis. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2007; 121:1679-90. [PMID: 17407904 DOI: 10.1121/1.2427123] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
Cepstral-based estimation is used to provide a baseline estimate of the noise level in the logarithmic spectrum for voiced speech. A theoretical description of cepstral processing of voiced speech containing aspiration noise, together with supporting empirical data, is provided in order to illustrate the nature of the noise baseline estimation process. Taking the Fourier transform of the liftered (filtered in the cepstral domain) cepstrum produces a noise baseline estimate. It is shown that Fourier transforming the low-pass liftered cepstrum is comparable to applying a moving average (MA) filter to the logarithmic spectrum and hence the baseline receives contributions from the glottal source excited vocal tract and the noise excited vocal tract. Because the estimation process resembles the action of a MA filter, the resulting noise baseline is determined by the harmonic resolution (as determined by the temporal analysis window length) and the glottal source spectral tilt. On selecting an appropriate temporal analysis window length the estimated baseline is shown to lie halfway between the glottal excited vocal tract and the noise excited vocal tract. This information is employed in a new harmonics-to-noise (HNR) estimation technique, which is shown to provide accurate HNR estimates when tested on synthetically generated voice signals.
Collapse
Affiliation(s)
- Peter J Murphy
- Department of Electronic and Computer Engineering, University of Limerick, Limerick, Ireland
| | | |
Collapse
|
12
|
Murphy PJ. On first rahmonic amplitude in the analysis of synthesized aperiodic voice signals. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2006; 120:2896-907. [PMID: 17139747 DOI: 10.1121/1.2355483] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Rahmonics comprise the prominent peaks in the cepstrum of voiced speech; their locations correspond to the fundamental period and its multiples. The amplitude of the first rahmonic, R1, has previously been used to indicate voice quality. Although a correspondence between R1 and the richness of the harmonic spectrum for voiced speech is well recognized, a formal description has remained absent. A theoretical description of rahmonic analysis of voiced speech containing aspiration noise is provided, leading to a characterization of R1. The theory suggests that R1 is directly proportional to the geometric mean harmonics-to-noise ratio (gmHNR), where the gmHNR is defined as the mean of the individual spectral (i.e., at specific frequency locations) harmonics-to-noise ratios in dB. This hypothesis is validated using synthetically generated voice signals. R1 is shown to be directly proportional to gmHNR (measured directly from the dB spectrum). It is shown that R1 (estimated from speech) is directly proportional to R1 taken from the glottal signal. R1 and gmHNR (measured spectrally) underestimate the actual gmHNR when (averaged) noise levels exceed harmonic levels. Limiting the number of harmonics in the analysis window overcomes this problem and also alleviates the (temporal) window length/f0 dependence of R1 when estimated period synchronously.
Collapse
Affiliation(s)
- Peter J Murphy
- Department of Electronic and Computer Engineering, University of Limerick, Limerick, Ireland
| |
Collapse
|
13
|
Murphy PJ, Akande OO. Cepstrum-Based Estimation of the Harmonics-to-Noise Ratio for Synthesized and Human Voice Signals. ACTA ACUST UNITED AC 2006. [DOI: 10.1007/11613107_13] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]
|
14
|
Ferrer CA, González E, Hernández-Díaz ME. Correcting the use of ensemble averages in the calculation of harmonics to noise ratios in voice signals (L). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2005; 118:605-7. [PMID: 16158615 DOI: 10.1121/1.1940450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
A correcting formula for the estimation of harmonics-to-noise ratios (HNR) based on ensemble-averaging techniques is derived. The original method yields a biased approximation which is more accurate as the number of averaged pulses (N) increases. However, the method treats gradual waveform changes incorrectly as noise, which is worsened for large values of N. The obtained formula allows the use of as few averaged pulses as desired, while allowing the complete removal of the bias from the estimate of HNR.
Collapse
Affiliation(s)
- Carlos A Ferrer
- Center of Studies of Electronics and Information Technologies, Central University of Las Villas, C. Camajuaní, Km 5 1/2 Santa Clara, 54800 Cuba
| | | | | |
Collapse
|
15
|
Kreiman J, Gerratt BR. Perception of aperiodicity in pathological voice. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2005; 117:2201-11. [PMID: 15898661 DOI: 10.1121/1.1858351] [Citation(s) in RCA: 72] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
Although jitter, shimmer, and noise acoustically characterize all voice signals, their perceptual importance in naturally produced pathological voices has not been established psychoacoustically. To determine the role of these attributes in the perception of vocal quality, listeners were asked to adjust levels of jitter, shimmer, and the noise-to-signal ratio in a speech synthesizer, so that synthetic voices matched naturally produced tokens. Results showed that, although listeners agreed well in their judgments of the noise-to-signal ratio, they did not agree with one another in their chosen settings for jitter and shimmer. Noise-dependent differences in listeners' ability to detect changes in amounts of jitter and shimmer implicate both listener insensitivity and inability to isolate jitter and shimmer as separate dimensions in the overall pattern of aperiodicity in a voice as causes of this poor agreement. These results suggest that jitter and shimmer are not useful as independent indices of perceived vocal quality, apart from their acoustic contributions to the overall pattern of spectrally shaped noise in a voice.
Collapse
Affiliation(s)
- Jody Kreiman
- Division of Head and Neck Surgery, UCLA School of Medicine, 31-24 Rehab Center, Los Angeles, California 90095-1794, USA.
| | | |
Collapse
|
16
|
Abstract
Both in normal speech voice and in some types of pathological voice, adjacent vocal cycles may alternate in amplitude or period, or both. When this occurs, the determination of voice fundamental frequency (defined as number of vocal cycles per second) becomes difficult. The present study attempts to address this issue by investigating how human listeners perceive the pitch of alternate cycles. As stimuli, vowels /a/ and /i/ were synthesized with fundamental frequencies at 140 Hz and 220 Hz, and the effect of alternate cycles was simulated with both amplitude- and frequency-modulation of the glottal volume velocity waveform. Subjects were asked to judge the pitch of the modulated vowels in reference to vowels without modulation. The results showed that (a) perceived pitch became lower as the amount of modulation increased, and the effect seems to be more dramatic than would be predicted by existing hypotheses, (b) perceived pitch differed across vowels, fundamental frequencies, and modulation types, that is, amplitude versus frequency modulation, and (c) the prediction of perceived pitch was best made in the frequency domain in terms of subharmonic-to-harmonic ratio. These findings provide useful information on how we should assess the pitch of alternate cycles. They may also be helpful in developing more robust pitch determination algorithms.
Collapse
Affiliation(s)
- Xuejing Sun
- Department of Communication Sciences and Disorders, Northwestern University, Evanston, Illinois 60208, USA.
| | | |
Collapse
|
17
|
Vieira MN, McInnes FR, Jack MA. On the influence of laryngeal pathologies on acoustic and electroglottographic jitter measures. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2002; 111:1045-1055. [PMID: 11863161 DOI: 10.1121/1.1430686] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
This study compared acoustic and electroglottographic (EGG) jitter from [a] vowels of 103 dysphonic speakers. The EGG recordings were chosen according to their intensity, signal-to-noise ratio, and percentage of unvoiced intervals, while acoustic signals were selected based on voicing detection and the reliability of jitter extraction. The agreement between jitter measures was expressed numerically as a normalized difference. In 63.1% (65/103) of the cases the differences fell within +/-22.5%. Positive differences above +22.5% were associated with increased acoustic jitter and occurred in 12.6% (13/103) of the speakers. These were, typically, cases of small nodular lesions without problems in the posterior larynx. On the other hand, substantial rises in EGG jitter leading to differences below -22.5% took place in 24.3% (25/103) of the speakers and were related to hyperfunctional voices, creaky-like voices, small laryngeal asymmetries affecting the arytenoids, or small-to-moderate glottal chinks. A clinically relevant outcome of the study was the possibility of detecting gentle laryngeal asymmetries among cases of large unilateral increase in EGG jitter. These asymmetries can be linked with vocal problems that are often overlooked in endoscopic examinations.
Collapse
Affiliation(s)
- Maurílio N Vieira
- Departamento de Física/ICEx, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil.
| | | | | |
Collapse
|
18
|
Riede T, Herzel H, Hammerschmidt K, Brunnberg L, Tembrock G. The harmonic-to-noise ratio applied to dog barks. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2001; 110:2191-2197. [PMID: 11681395 DOI: 10.1121/1.1398052] [Citation(s) in RCA: 39] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Dog barks are typically a mixture of regular components and irregular (noisy) components. The regular part of the signal is given by a series of harmonics and is most probably due to regular vibrations of the vocal folds, whereas noise refers to any nonharmonic (irregular) energy in the spectrum of the bark signal. The noise components might be due to chaotic vibrations of the vocal-fold tissue or due to turbulence of the air. The ratio of harmonic to nonharmonic energy in dog barks is quantified by applying the harmonics-to-noise ratio (HNR). Barks of a single dog breed were recorded in the same behavioral context. Two groups of dogs were considered: a group of ten healthy dogs (the normal sample), and a group of ten unhealthy dogs, i.e., dogs treated in a veterinary clinic (the clinic sample). Although the unhealthy dogs had no voice disease, differences in emotion or pain or impacts of surgery might have influenced their barks. The barks of the dogs were recorded for a period of 6 months. The HNR computation is based on the Fourier spectrum of a 50-ms section from the middle of the bark. A 10-point moving average curve of the spectrum on a logarithmic scale is considered as estimator of the noise level in the bark, and the maximum difference of the original spectrum and the moving average is defined as the HNR measure. It is shown that a reasonable ranking of the voices is achievable based on the measurement of the HNR. The HNR-based classification is found to be consistent with perceptual evaluation of the barks. In addition, a multiparametric approach confirms the classification based on the HNR. Hence, it may be concluded that the HNR might be useful as a novel parameter in bioacoustics for quantifying the noise within a signal.
Collapse
Affiliation(s)
- T Riede
- Institut für Biologie, Humboldt-Universität zu Berlin, Germany.
| | | | | | | | | |
Collapse
|
19
|
Murphy PJ. Spectral characterization of jitter, shimmer, and additive noise in synthetically generated voice signals. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2000; 107:978-988. [PMID: 10687707 DOI: 10.1121/1.428272] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Alteration of the harmonic structure in voice source spectra, taken over at least two periods of the waveform, may occur due to the presence of fundamental frequency (f0) perturbation, amplitude perturbation, additive noise, or changes within the glottal source signal itself. In order to make accurate inferences regarding glottal-flow dynamics or perceptual evaluations based on spectral measurements taken from the acoustic speech waveform, investigation of the spectral features of each aperiodic component is required. Based on a heuristic development involving a consideration of the partial sum of the Fourier series taken for two periods of a jittered, shimmered, and (additive, random) noise-contaminated signal, the corresponding spectral characteristics are hypothesized. Subsequent to this, the Fourier series coefficients are calculated for the two periods in order to test the hypotheses. Definite spectral differences are found for each aperiodic component; based on these findings differential quantitative spectral measurements are suggested. Further supportive evidence is obtained through use of Fourier transform and periodogram-averaged calculations. The analysis is carried out on synthetically generated glottal-pulse waveforms and on radiated speech waveforms. A discussion of the results is given in terms of voice aperiodicity in general and in terms of their implication for future studies involving human voice signals.
Collapse
Affiliation(s)
- P J Murphy
- Department of Electronic and Computer Engineering, University of Limerick, Ireland.
| |
Collapse
|