26
|
Tran PK, Letowski TR, McBride ME. The effect of bone conduction microphone placement on intensity and spectrum of transmitted speech items. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 133:3900-3908. [PMID: 23742344 DOI: 10.1121/1.4803870] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Speech signals can be converted into electrical audio signals using either conventional air conduction (AC) microphone or a contact bone conduction (BC) microphone. The goal of this study was to investigate the effects of the location of a BC microphone on the intensity and frequency spectrum of the recorded speech. Twelve locations, 11 on the talker's head and 1 on the collar bone, were investigated. The speech sounds were three vowels (/u/, /a/, /i/) and two consonants (/m/, /∫/). The sounds were produced by 12 talkers. Each sound was recorded simultaneously with two BC microphones and an AC microphone. Analyzed spectral data showed that the BC recordings made at the forehead of the talker were the most similar to the AC recordings, whereas the collar bone recordings were most different. Comparison of the spectral data with speech intelligibility data collected in another study revealed a strong negative relationship between BC speech intelligibility and the degree of deviation of the BC speech spectrum from the AC spectrum. In addition, the head locations that resulted in the highest speech intelligibility were associated with the lowest output signals among all tested locations. Implications of these findings for BC communication are discussed.
Collapse
|
27
|
Brunnegård K, Lohmander A, van Doorn J. Comparison between perceptual assessments of nasality and nasalance scores. INTERNATIONAL JOURNAL OF LANGUAGE & COMMUNICATION DISORDERS 2012; 47:556-566. [PMID: 22938066 DOI: 10.1111/j.1460-6984.2012.00165.x] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
BACKGROUND There are different reports of the usefulness of the Nasometer™ as a complement to listening, often as correlation calculations between listening and nasalance measurements. Differences between findings have been attributed to listener experience and types of speech stimuli. AIMS To compare nasalance scores from the Nasometer with perceptual assessments, for the same and different Swedish speech stimuli, using three groups of listeners with differing levels of experience in judging speech nasality. METHODS & PROCEDURES To compare nasalance scores and blinded listener ratings of randomized recordings using three groups of listeners and two groups of speakers. Speakers were either classified as having hypernasal speech or speech with typical speech resonance. Listeners were speech-language pathologists (SLPs) working predominantly with resonance disorders, other SLPs and untrained listeners. OUTCOMES & RESULTS Correlations (r(s)) between hypernasality ratings and nasalance scores for each listener group and speech stimuli were calculated. For both groups of SLPs all correlations between perceptual ratings and nasalance scores were significant at p= 0.01. The correlations between the nasalance scores and ratings by listeners in the SLP groups were higher than those for the untrained listener group regardless of stimulus type. Post-hoc Mann-Whitney U-tests showed that the only difference that was significant was expert SLP group versus untrained listener group. Secondly, correlations between perceptual ratings and oral stimulus nasalance scores were higher when the perceptual ratings were based on spontaneous speech rather than on the oral stimulus. However, a Wilcoxon signed rank test showed that the difference was not significant. A third finding was that correlations between oral stimulus nasalance scores and perceptual scores were higher than those between mixed stimulus nasalance scores and perceptual scores. A Wilcoxon signed rank test showed that the difference was significant. CONCLUSIONS & IMPLICATIONS The Nasometer might be useful for the SLP with limited experience in assessing resonance disorders in differentiating between hyper- and hyponasality. With listener reliability for ratings of hypernasality still being an issue, the use of a nasalance score as a complement to the perceptual evaluation will also aid the expert SLP. It will give an alternative way of quantifying speech resonance and might help in especially hard to judge cases.
Collapse
|
28
|
Liker M, Horga D, Mildner V. Electropalatographic specification of Croatian fricatives /s/ and /z/. CLINICAL LINGUISTICS & PHONETICS 2012; 26:199-215. [PMID: 21967279 DOI: 10.3109/02699206.2011.602460] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Electropalatographic specification of alveolar fricatives in Croatian is aimed at providing speech therapists with normative data about the range of acceptable productions of /s/ and /z/ in adult speakers of Croatian. Four variables were analysed: place of articulation, total contact, groove width and hold phase duration. Intra- and inter-speaker variability for each variable was analysed. Lingual palatal cues for voicing difference were also quantified and discussed. Results show that Croatian /s/ and /z/ are alveolar and not dental as previously reported. The comparison between the voiced and the voiceless fricative shows that durational measures provide the best differentiation. The voiceless counterpart is significantly longer. The difference between voiced and voiceless is also found in the total contact, with /z/ having more contact in the anterior four rows of electrodes, while /s/ has more contact in the posterior four rows of electrodes. This difference is also reflected in the anterior and the posterior groove widths. Possibilities of using these results as normative data for the diagnosis and treatment of atypical articulation of /s/ and /z/ are discussed.
Collapse
|
29
|
Myers FL, Bakker K, St Louis KO, Raphael LJ. Disfluencies in cluttered speech. JOURNAL OF FLUENCY DISORDERS 2012; 37:9-19. [PMID: 22325918 DOI: 10.1016/j.jfludis.2011.10.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2011] [Revised: 07/14/2011] [Accepted: 10/08/2011] [Indexed: 05/31/2023]
Abstract
UNLABELLED The purpose of this study was to examine the nature and frequency of occurrence of disfluencies, as they occur in singletons and in clusters, in the conversational speech of individuals who clutter compared to typical speakers. Except for two disfluency types (revisions in clusters, and word repetitions in clusters) nearly all disfluency types were virtually indistinguishable in frequency of occurrence between the two groups. These findings shed light on cluttering in several respects, foremost of which is that it provides documentation on the nature of disfluencies in cluttering. Findings also have implications for our understanding of the relationship between cluttering and typical speech, cluttering and stuttering, the Cluttering Spectrum Hypothesis, as well as the Lowest Common Denominator definition of cluttering. EDUCATIONAL OBJECTIVES At the end of this activity the reader will be able to: (a) identify types of disfluency associated with cluttered speech; (b) contrast disfluencies in cluttered speech with those associated with stuttering; (c) compare the disfluencies of typical speakers with those of cluttering; (d) explain the perceptual nature of cluttering.
Collapse
|
30
|
Fitzsimons DA, Jones DL, Barton B, North KN. A procedure for the computerized analysis of cleft palate speech transcription. CLINICAL LINGUISTICS & PHONETICS 2012; 26:18-38. [PMID: 21728832 DOI: 10.3109/02699206.2011.584270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
The phonetic symbols used by speech-language pathologists to transcribe speech contain underlying hexadecimal values used by computers to correctly display and process transcription data. This study aimed to develop a procedure to utilise these values as the basis for subsequent computerized analysis of cleft palate speech. A computer keyboard file and a modified font file were developed using symbols from the International Phonetic Alphabet and extensions to the International Phonetic Alphabet to improve the computerized storage of phonetic symbols used in cleft palate speech transcription. Computerized coding procedures were written to retrieve hexadecimal values of transcribed symbols and match these to their phonetic attributes as defined in the International Phonetic Alphabet and extensions to the International Phonetic Alphabet. Computerized procedures were subsequently developed to analyse transcription data based on these matched hexadecimal values and their associated phonetic attributes, with respect to cleft palate speech. This method will be a useful addition to existing computerized speech analysis tools.
Collapse
|
31
|
Hong WH, Chen HC, Yang FPG, Wu CY, Chen CL, Wong AMK. Speech-associated labiomandibular movement in Mandarin-speaking children with quadriplegic cerebral palsy: a kinematic study. RESEARCH IN DEVELOPMENTAL DISABILITIES 2011; 32:2595-2601. [PMID: 21775100 DOI: 10.1016/j.ridd.2011.06.016] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2011] [Accepted: 06/24/2011] [Indexed: 05/31/2023]
Abstract
The purpose of this study was to investigate the speech-associated labiomandibular movement during articulation production in Mandarin-speaking children with spastic quadriplegic (SQ) cerebral palsy (CP). Twelve children with SQ CP (aged 7-11 years) and 12 age-matched healthy children as controls were enrolled for the study. All children underwent analysis of percentage of consonants correct (PCC) and kinematic analysis of speech tasks using the Vicon Motion 370 system. Kinematic parameters included utterance duration, displacement and velocity of the lip and jaw, coefficient of variation (CV) of lip utterance duration, and spatial and temporal coupling of labiomandibular movement of speech produced in mono-syllable (MS) and poly-syllable (PS) tasks. Children with CP showed lower temporal coupling (MS, p = 0.015; PS, p = 0.007), but not spatial coupling, of labiomandibular movement than healthy children. Children with CP had greater CVs (MS, p = 0.003; PS, p = 0.010) and the peak opening displacement and velocity of lower lip and jaw (p < 0.05) and lower PCC (p < 0.001) than healthy children. Children with SQ CP displayed labiomandibular coupling movement impairment, especially in the aspect of temporal coupling. These children also had high temporal oromotor variability and needed to make more effort to coordinate the labiomandibular movement for speech production.
Collapse
|
32
|
Awan SN, Omlor K, Watts CR. Effects of computer system and vowel loading on measures of nasalance. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2011; 54:1284-1294. [PMID: 21498579 DOI: 10.1044/1092-4388(2011/10-0201)] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
PURPOSE The purpose of this study was to determine similarities and differences in nasalance scores observed with different computerized nasalance systems in the context of vowel-loaded sentences. Methodology Subjects were 46 Caucasian adults with no perceived hyper- or hyponasality. Nasalance scores were obtained using the Nasometer 6200 (Kay Elemetrics Corp.), the Nasometer II 6400 (Kay Elemetrics Corp.), and the NasalView (Tiger DRS, Inc.) for sentences loaded with mixed, high front, high back, low front, or low back vowels. RESULTS Measures of nasalance obtained with the NasalView were significantly higher than those obtained with the Nasometer 6200, and the measures of nasalance obtained with the Nasometer 6200 were significantly higher than those obtained with the Nasometer II 6400. However, similar effects of vowel loading on measures of nasalance were observed, regardless of system. For all systems, the high front vowel sentence tended to result in higher measures of nasalance than did the high back, low front, and low back vowel sentences--the mixed vowel sentence tended to have a higher degree of nasalance than did any of the other sentences. CONCLUSIONS Although nasalance data computed using different systems are not readily comparable, all three systems that were evaluated produced similar effects of vowel loading on nasalance. Increased nasalance for high front versus low back vowels may be due to factors such as increased oral impedance, reduced radiated oral sound pressure, possible increases in airflow via the nasal cavity, and increased transpalatal nasalance.
Collapse
|
33
|
Berry JJ. Accuracy of the NDI wave speech research system. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2011; 54:1295-1301. [PMID: 21498575 DOI: 10.1044/1092-4388(2011/10-0226)] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
PURPOSE This work provides a quantitative assessment of the positional tracking accuracy of the NDI Wave Speech Research System. METHOD Three experiments were completed: (a) static rigid-body tracking across different locations in the electromagnetic field volume, (b) dynamic rigid-body tracking across different locations within the electromagnetic field volume, and (c) human jaw-movement tracking during speech. Rigid-body experiments were completed for 4 different instrumentation settings, permuting 2 electromagnetic field volume sizes with and without automated reference sensor processing. RESULTS Within the anthropometrically pertinent "near field" (< 200 mm) of the NDI Wave field generator, at the 300-mm(3) volume setting, 88% of dynamic positional errors were < 0.5 mm and 98% were < 1.0 mm. Extreme tracking errors (> 2 mm) occurred within the near field for < 1% of position samples. For human jaw-movement tracking, 95% of position samples had < 0.5 mm errors for 9 out of 10 subjects. CONCLUSIONS Static tracking accuracy is modestly superior to dynamic tracking accuracy. Dynamic tracking accuracy is best for the 300-mm(3) field setting in the 200-mm near field. The use of automated head correction has no deleterious effect on tracking. Tracking errors for jaw movements during speech are typically < 0.5 mm.
Collapse
|
34
|
Miller AL, Finch KB. Corrected high-frame rate anchored ultrasound with software alignment. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2011; 54:471-486. [PMID: 20884781 DOI: 10.1044/1092-4388(2010/09-0103)] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
PURPOSE To improve lingual ultrasound imaging with the corrected high frame rate anchored ultrasound with software alignment (CHAUSA; Miller, 2008) method. METHOD A production study of the IsiXhosa alveolar click is presented. Articulatory-to-acoustic alignment is demonstrated using a Tri-Modal 3-ms pulse generator. Images from 2 simultaneous data collection paths, using dominant ultrasound technology and the CHAUSA method, are compared. The probe stabilization and head movement correction paradigm is demonstrated. RESULTS The CHAUSA method increases the frame rate from the standard National Television System Committee (NTSC) video rate (29.97) to the ultrasound internal machine rate--in this case, 124 frames per second (fps)--by using Digital Imaging and Communications in Medicine (DICOM; National Electrical Manufacturers Association, 2008) data transfer. DICOM avoids spatiotemporal inaccuracies introduced by dominant ultrasound export techniques. The data display alignment of the acoustic and articulatory signals to the correct high-frame rate (FR) frame (± 4 ms at 124 fps). CONCLUSIONS CHAUSA produces high-FR, high-spatial-quality ultrasound images, which are head corrected to 1 mm. The method reveals tongue dorsum retraction during the posterior release of the alveolar click and tongue tip recoil following the anterior release of the alveolar click, both of which were previously undetectable. CHAUSA visualizes most of the tongue in studies of dynamic consonants with a major reduction in field problems, opening up important areas of speech research.
Collapse
|
35
|
Folker JE, Murdoch BE, Cahill LM, Delatycki MB, Corben LA, Vogel AP. Kinematic analysis of lingual movements during consonant productions in dysarthric speakers with Friedreich's ataxia: A case-by-case analysis. CLINICAL LINGUISTICS & PHONETICS 2011; 25:66-79. [PMID: 20932172 DOI: 10.3109/02699206.2010.511760] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Articulatory kinematics were investigated using electromagnetic articulography (EMA) in four dysarthric speakers with Friedreich's ataxia (FRDA). Specifically, tongue-tip and tongue-back movements were recorded by the AG-200 EMA system during production of the consonants /t/ and /k/ as produced within a sentence utterance and during a rapid syllable repetition task. The results obtained for each of the participants with FRDA were individually compared to those obtained by a control group (n = 10). Results revealed significantly greater movement durations and increased articulatory distances, most predominantly during the approach phase of consonant production. A task difference was observed with lingual kinematics more disturbed during the syllable repetition task than during the sentence utterance. Despite expectations of slowed articulatory movements in FRDA dysarthria, the EMA data indicated that the observed prolongation of consonant phase durations was generally associated with greater articulatory distances, rather than slowed movement execution.
Collapse
|
36
|
Oller DK, Niyogi P, Gray S, Richards JA, Gilkerson J, Xu D, Yapanel U, Warren SF. Automated vocal analysis of naturalistic recordings from children with autism, language delay, and typical development. Proc Natl Acad Sci U S A 2010; 107:13354-9. [PMID: 20643944 PMCID: PMC2922144 DOI: 10.1073/pnas.1003882107] [Citation(s) in RCA: 183] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
For generations the study of vocal development and its role in language has been conducted laboriously, with human transcribers and analysts coding and taking measurements from small recorded samples. Our research illustrates a method to obtain measures of early speech development through automated analysis of massive quantities of day-long audio recordings collected naturalistically in children's homes. A primary goal is to provide insights into the development of infant control over infrastructural characteristics of speech through large-scale statistical analysis of strategically selected acoustic parameters. In pursuit of this goal we have discovered that the first automated approach we implemented is not only able to track children's development on acoustic parameters known to play key roles in speech, but also is able to differentiate vocalizations from typically developing children and children with autism or language delay. The method is totally automated, with no human intervention, allowing efficient sampling and analysis at unprecedented scales. The work shows the potential to fundamentally enhance research in vocal development and to add a fully objective measure to the battery used to detect speech-related disorders in early childhood. Thus, automated analysis should soon be able to contribute to screening and diagnosis procedures for early disorders, and more generally, the findings suggest fundamental methods for the study of language in natural environments.
Collapse
|
37
|
O'Brian S, Jones M, Pilowsky R, Onslow M, Packman A, Menzies R. A new method to sample stuttering in preschool children. INTERNATIONAL JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2010; 12:173-177. [PMID: 20433336 DOI: 10.3109/17549500903464338] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
This study reports a new method for sampling the speech of preschool stuttering children outside the clinic environment. Twenty parents engaged their stuttering children in an everyday play activity in the home with a telephone handset nearby. A remotely located researcher telephoned the parent and recorded the play session with a phone-recording jack attached to a digital audio recorder at the remote location. The parent placed an audio recorder near the child for comparison purposes. Children as young as 2 years complied with the remote method of speech sampling. The quality of the remote recordings was superior to that of the in-home recordings. There was no difference in means or reliability of stutter-count measures made from the remote recordings compared with those made in-home. Advantages of the new method include: (1) cost efficiency of real-time measurement of percent syllables stuttered in naturalistic situations, (2) reduction of bias associated with parent-selected timing of home recordings, (3) standardization of speech sampling procedures, (4) improved parent compliance with sampling procedures, (5) clinician or researcher on-line control of the acoustic and linguistic quality of recordings, and (6) elimination of the need to lend equipment to parents for speech sampling.
Collapse
|
38
|
McNeil MR, Katz WF, Fossett TRD, Garst DM, Szuminsky NJ, Carter G, Lim KY. Effects of online augmented kinematic and perceptual feedback on treatment of speech movements in apraxia of speech. Folia Phoniatr Logop 2010; 62:127-33. [PMID: 20424468 PMCID: PMC2871060 DOI: 10.1159/000287211] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Apraxia of speech (AOS) is a motor speech disorder characterized by disturbed spatial and temporal parameters of movement. Research on motor learning suggests that augmented feedback may provide a beneficial effect for training movement. This study examined the effects of the presence and frequency of online augmented visual kinematic feedback (AVKF) and clinician-provided perceptual feedback on speech accuracy in 2 adults with acquired AOS. Within a single-subject multiple-baseline design, AVKF was provided using electromagnetic midsagittal articulography (EMA) in 2 feedback conditions (50 or 100%). Articulator placement was specified for speech motor targets (SMTs). Treated and baselined SMTs were in the initial or final position of single-syllable words, in varying consonant-vowel or vowel-consonant contexts. SMTs were selected based on each participant's pre-assessed erred productions. Productions were digitally recorded and online perceptual judgments of accuracy (including segment and intersegment distortions) were made. Inter- and intra-judge reliability for perceptual accuracy was high. Results measured by visual inspection and effect size revealed positive acquisition and generalization effects for both participants. Generalization occurred across vowel contexts and to untreated probes. Results of the frequency manipulation were confounded by presentation order. Maintenance of learned and generalized effects were demonstrated for 1 participant. These data provide support for the role of augmented feedback in treating speech movements that result in perceptually accurate speech production. Future investigations will explore the independent contributions of each feedback type (i.e. kinematic and perceptual) in producing efficient and effective training of SMTs in persons with AOS.
Collapse
|
39
|
Morrow SL, Connor NP. Comparison of voice-use profiles between elementary classroom and music teachers. J Voice 2010; 25:367-72. [PMID: 20359861 DOI: 10.1016/j.jvoice.2009.11.006] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2009] [Accepted: 11/10/2009] [Indexed: 11/18/2022]
Abstract
Among teachers, music teachers are roughly four times more likely than classroom teachers to develop voice-related problems. Although it has been established that music teachers use their voices at high intensities and durations in the course of their workday, voice-use profiles concerning the amount and intensity of vocal use and vocal load have neither been quantified nor has vocal load for music teachers been compared with classroom teachers using these same voice-use parameters. In this study, total phonation time, fundamental frequency (F₀), and vocal intensity (dB SPL [sound pressure level]) were measured or estimated directly using a KayPENTAX Ambulatory Phonation Monitor (KayPENTAX, Lincoln Park, NJ). Vocal load was calculated as cycle and distance dose, as defined by Švec et al (2003), which integrates total phonation time, F₀, and vocal intensity. Twelve participants (n = 7 elementary music teachers and n = 5 elementary classroom teachers) were monitored during five full teaching days of one workweek to determine average vocal load for these two groups of teachers. Statistically significant differences in all measures were found between the two groups (P < 0.05) with large effect sizes for all parameters. These results suggest that typical vocal loads for music teachers are substantially higher than those experienced by classroom teachers (P < 0.01). This study suggests that reducing vocal load may have immediate clinical and educational benefits in vocal health in music teachers.
Collapse
|
40
|
Sawigun C, Ngamkham W, Serdijn WA. Comparison of speech processing strategies for the design of an ultra low-power analog bionic ear. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2010; 2010:1374-1377. [PMID: 21096335 DOI: 10.1109/iembs.2010.5626737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Miniaturizing area and power consumptions of cochlear prosthetic devices is strongly required for full implantation. In this paper, several speech encoding strategies are studied and compared in order to find a compact speech processor that allows for full implantation and is able to convey both time and frequency components of the incoming speech to a set of electrical pulse stimuli. The study covers the widely recognized continuous time interleaved sampling (CIS) and strategies that convey the temporal fine structure (TFS), including race-to-spike asynchronous interleaved sampling (AIS), phase-locking (PL) using zero-crossing detection (ZCD), and PL using a peak-picking (PP) technique. To estimate the performances of the four systems, a spike-based reconstruction algorithm is employed to retrieve the original sounds after being processed by different strategies. The correlation factors between the reconstructed and original signals imply that strategies that convey TFS outperform CIS. Among them, the peak picking technique combines good performance with great compactness since envelope detectors are not required.
Collapse
|
41
|
Deng X, Chen J, Shuai J. [Detection of endpoint for segmentation between consonants and vowels in aphasia rehabilitation software based on artificial intelligence scheduling]. SHENG WU YI XUE GONG CHENG XUE ZA ZHI = JOURNAL OF BIOMEDICAL ENGINEERING = SHENGWU YIXUE GONGCHENGXUE ZAZHI 2009; 26:886-899. [PMID: 19813633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
For the purpose of improving the efficiency of aphasia rehabilitation training, artificial intelligence-scheduling function is added in the aphasia rehabilitation software, and the software's performance is improved. With the characteristics of aphasia patient's voice as well as with the need of artificial intelligence-scheduling functions under consideration, the present authors have designed a set of endpoint detection algorithm. It determines the reference endpoints, then extracts every word and ensures the reasonable segmentation points between consonants and vowels, using the reference endpoints. The results of experiments show that the algorithm is able to attain the objects of detection at a higher accuracy rate. Therefore, it is applicable to the detection of endpoint on aphasia-patient's voice.
Collapse
|
42
|
Winkler R, Sendlmeier W. EGG open quotient in aging voices—changes with increasing chronological age and its perception. LOGOP PHONIATR VOCO 2009; 31:51-6. [PMID: 16754276 DOI: 10.1080/14015430500445534] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
This paper presents the results of open quotient (OQ) measurements in electroglottographic (EGG) signals of young (18-30 years) and elderly (60-82 years) male and female speakers. The paper further presents quantitative results of the relation between the EGG OQ and the perception of a speaker's age. Higgins and Saxman found a decreased EGG OQ with increased age for females, while the EGG OQ increased for males as the speaker's age increased in sustained vowel material 1. Although laryngeal degeneration due to increased age seems to occur to a lesser extent in females, the significant decrease of the OQ in elderly female voices could not be explained in terms of age-related physiological changes. Linville found increased spectral amplitudes in the region of F0 for the elderly (obtained by long-term average spectra (LTAS) measurements of read speech material), independent of gender, which could be indirectly interpreted as an increasing OQ 3. We measured the EGG OQ, not only for sustained vowels but also in vowels taken from isolated words and read speech material. To analyse the relation between breathiness in terms of an increased EGG OQ and the mean perceived age per stimulus, a perception test was carried out, in which listeners were asked to estimate speaker's age based on sustained /a/-vowels varying in vocal effort (soft-normal-loud) during production. 1) The decreased EGG OQ for elderly females originally found by Higgins and Saxman 1 is not apparent in our data for sustained /a/-vowels; for males, however, we also found an increased EGG OQ for the elderly speakers. 2) In addition, an increased EGG OQ for the group of elderly in comparison to the younger males occurs for the unstressed syllable of the word material. 3) Our results show a strong positive relation between perceived age and EGG OQ in male vowel stimuli. Regarding 2), depending on the speech task at least a male speaker's voice gets more breathy as age increases. Considering 3), increased breathiness may contribute to the listener's perception of increased age.
Collapse
|
43
|
Herbst C, Ternström S. A comparison of different methods to measure the EGG contact quotient. LOGOP PHONIATR VOCO 2009; 31:126-38. [PMID: 16966155 DOI: 10.1080/14015430500376580] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
The results from six published electroglottographic (EGG-based) methods for calculating the EGG contact quotient (CQEGG) were compared to closed quotients derived from simultaneous videokymographic imaging (CQKYM). Two trained male singers phonated in falsetto and in chest register, with two degrees of adduction in both registers. The maximum difference between methods in the CQEGG was 0.3 (out of 1.0). The CQEGG was generally lower than the CQKYM. Within subjects, the CQEGG co-varied with the CQkym, but with changing offsets depending on method. The CQEGG cannot be calculated for falsetto phonation with little adduction, since there is no complete glottal closure. Basic criterion-level methods with thresholds of 0.2 or 0.25 gave the best match to the CQKYM data. The results suggest that contacting and de-contacting in the EGG might not refer to the same physical events as do the beginning and cessation of airflow.
Collapse
|
44
|
Svec JG, Popolo PS, Titze IR. Measurement of vocal doses in speech: experimental procedure and signal processing. LOGOP PHONIATR VOCO 2009; 28:181-92. [PMID: 14686546 DOI: 10.1080/14015430310018892] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
An experimental method for quantifying the amount of voicing over time is described in a tutorial manner. A new procedure for obtaining calibrated sound pressure levels (SPL) of speech from a head-mounted microphone is offered. An algorithm for voicing detection (kv) and fundamental frequency (F0) extraction from an electroglottographic signal is described. The extracted values of SPL, F0, and kv are used to derive five vocal doses: the time dose (total voicing time), the cycle dose (total number of vocal fold oscillatory cycles), the distance dose (total distance travelled by the vocal folds in an oscillatory path), the energy dissipation dose (total amount of heat energy dissipated in the vocal folds) and the radiated energy dose (total acoustic energy radiated from the mouth). The doses measure the vocal load and can be used for studying the effects of vocal fold tissue exposure to vibration.
Collapse
|
45
|
Van Lierde KM, Wuyts FL, De Bodt M, Van Cauwenberge P. Age‐related patterns of nasal resonance in normal Flemish children and young adults. ACTA ACUST UNITED AC 2009; 37:344-50. [PMID: 15328773 DOI: 10.1080/02844310310004307] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
The main purpose of the present acoustical study was to delineate further the changes in nasal resonance in childhood and young adulthood. An additional objective was to collect reference nasal resonance scores for normal Flemish-speaking children. Scores were recorded with a Nasometer while 33 children produced sounds and read three standard passages. We compared the nasal resonance data from the children with those of 58 adults that had been obtained in a previous study. Age had a significant effect on three sounds and two texts. The results indicated that young Flemish adults had higher nasal resonance scores than children, particularly when the reading stimuli included nasal consonants for which a co-ordinated opening and closing function of the velopharyngeal mechanism was required. These results reflect anatomical changes and differences in speech programming associated with growth.
Collapse
|
46
|
Munger JB, Thomson SL. Frequency response of the skin on the head and neck during production of selected speech sounds. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 124:4001-4012. [PMID: 19206823 DOI: 10.1121/1.3001703] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Vibrations within the vocal tract during speech are transmitted through tissue to the skin surface and can be used to transmit speech. Achieving quality speech signals using skin vibration is desirable but problematic, primarily due to the several sound production locations along the vocal tract. The objective of this study was to characterize the frequency content of speech signals on various locations of the head and neck. Signals were recorded using a microphone and accelerometers attached to 15 locations on the heads and necks of 14 males and 10 females. The subjects voiced various phonemes and one phrase. The power spectral densities (PSD) of the phonemes were used to determine a quality ranking for each location and sound. Spectrograms were used to examine signal frequency content for selected locations. A perceptual listening test was conducted and compared to the PSD rankings. The signal-to-noise ratio was found for each location with and without background noise. These results are presented and discussed. Notably, while high-frequency content is attenuated at the throat, it is shown to be detectable at some other locations. The best locations for speech transmission were found to be generally common to males and females.
Collapse
|
47
|
George NA, de Mul FFM, Qiu Q, Rakhorst G, Schutte HK. New laryngoscope for quantitative high-speed imaging of human vocal folds vibration in the horizontal and vertical direction. JOURNAL OF BIOMEDICAL OPTICS 2008; 13:064024. [PMID: 19123670 DOI: 10.1117/1.3041164] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
We report the design of a novel laser line-triangulation laryngoscope for the quantitative visualization of the three-dimensional movements of human vocal folds during phonation. This is the first successful in vivo recording of the three-dimensional movements of human vocal folds in absolute values. Triangulation images of the vocal folds are recorded at the rate of 4000 fps with a resolution of 256x256 pixels. A special image-processing algorithm is developed to precisely follow the subpixel movements of the laser line image. Vibration profiles in both horizontal and vertical directions are calibrated and measured in absolute SI units with a resolution of +/-50 microm. We also present a movie showing the vocal folds dynamics in vertical cross section.
Collapse
|
48
|
Sie KCY, Chen EY. Management of velopharyngeal insufficiency: development of a protocol and modifications of sphincter pharyngoplasty. Facial Plast Surg 2007; 23:128-39. [PMID: 17516340 DOI: 10.1055/s-2007-979282] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
Abstract
Velopharyngeal closure is required for normal speech production. Incomplete velopharyngeal closure manifests as resonance disorders and nasal air escape. Management of velopharyngeal insufficiency requires a general knowledge of speech production as well as a more detailed understanding of the velopharyngeal mechanism. Comprehensive evaluation by a velopharyngeal insufficiency team includes medical assessment focusing on airway obstructive symptoms, perceptual speech analysis, and instrumental assessment, which is utilized to characterize the velopharyngeal gap. Options for intervention include speech therapy, intraoral prosthetic devices, and surgery. Surgical interventions can be categorized as palatal, palatopharyngeal, or pharyngeal procedures. The therapeutic challenge lies in achieving velopharyngeal closure during speech production while maintaining patency of the upper airway. We present our protocol for evaluation of velopharyngeal function with a focus on indications for palatoplasty and pharyngoplasty. We also discuss surgical modifications of sphincter pharyngoplasty.
Collapse
|
49
|
Scheuerle J. Velopharyngeal dysfunction in perspective: a commentary on the Smith and Kuehn article. J Craniofac Surg 2007; 18:262-4. [PMID: 17414270 DOI: 10.1097/scs.0b013e3180341dc4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
|
50
|
Abstract
This article reviews concepts basic to the evaluation of the speech of persons with velopharyngeal dysfunction. It defines velopharyngeal dysfunction as well as reviews normal and abnormal velopharyngeal function for speech. It defines the common speech characteristics of persons with velopharyngeal dysfunction, including hypernasality, hyponasality, nasal emission, compensatory articulations, and weak pressure consonants. Speech sounds commonly impacted by velopharyngeal dysfunction are discussed. This article identifies the components of a complete speech evaluation as well as identifies anatomic and physiologic measurements of palatal function used to corroborate perceptual speech judgments indicating palatal problems. It identifies special considerations in the evaluation of persons with suspected velopharyngeal dysfunction. It briefly discusses management of velopharyngeal dysfunction. Review questions follow the article.
Collapse
|