1
|
Yegnanarayana B, Pannala V. Processing group delay spectrograms for study of formant and harmonic contours in speech signals. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 156:2422-2433. [PMID: 39392353 DOI: 10.1121/10.0032364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 09/20/2024] [Indexed: 10/12/2024]
Abstract
This paper deals with study of formant and harmonic contours by processing the group delay (GD) spectrograms of speech signals. The GD spectrum is the negative derivative of the phase spectrum with respect to frequency. Recent study shows that the GD spectrogram can be obtained without phase wrapping. Formant frequency contours can be observed in the display of the peaks of the instantaneous wideband equivalent GD spectrogram, derived using the modified single frequency filtering (SFF) analysis of speech signals. Harmonic frequency contours can be observed in the display of the peaks of the instantaneous narrowband equivalent GD spectrogram, derived using the modified SFF analysis of speech signals. For synthetic speech signals, the observed formant contours match the ground truth formant contours from which the signal is derived. For natural speech signals, the observed formant contours match approximately with the given ground truth formant contours mostly in the voiced regions. The results are illustrated for several randomly selected utterances from the TIMIT database. While this study helps to observe the contours of formants in the display, automatic extraction of the formant frequencies needs further processing, requiring logic for eliminating the spurious points, without forcing the number of formants.
Collapse
Affiliation(s)
- B Yegnanarayana
- International Institute of Information Technology, Hyderabad 500032, India
| | - Vishala Pannala
- Department of Artificial Intelligence and Data Science, Koneru Lakshmaiah Education Foundation, Hyderabad 500075, India
| |
Collapse
|
2
|
Shadle CH, Fulop SA, Chen WR, Whalen DH. Assessing accuracy of resonances obtained with reassigned spectrograms from the "ground truth" of physical vocal tract models. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:1253-1263. [PMID: 38341748 PMCID: PMC10858790 DOI: 10.1121/10.0024548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 01/04/2024] [Accepted: 01/06/2024] [Indexed: 02/13/2024]
Abstract
The reassigned spectrogram (RS) has emerged as the most accurate way to infer vocal tract resonances from the acoustic signal [Shadle, Nam, and Whalen (2016). "Comparing measurement errors for formants in synthetic and natural vowels," J. Acoust. Soc. Am. 139(2), 713-727]. To date, validating its accuracy has depended on formant synthesis for ground truth values of these resonances. Synthesis is easily controlled, but it has many intrinsic assumptions that do not necessarily accurately realize the acoustics in the way that physical resonances would. Here, we show that physical models of the vocal tract with derivable resonance values allow a separate approach to the ground truth, with a different range of limitations. Our three-dimensional printed vocal tract models were excited by white noise, allowing an accurate determination of the resonance frequencies. Then, sources with a range of fundamental frequencies were implemented, allowing a direct assessment of whether RS avoided the systematic bias towards the nearest strong harmonic to which other analysis techniques are prone. RS was indeed accurate at fundamental frequencies up to 300 Hz; above that, accuracy was somewhat reduced. Future directions include testing mechanical models with the dimensions of children's vocal tracts and making RS more broadly useful by automating the detection of resonances.
Collapse
Affiliation(s)
- Christine H Shadle
- Yale Child Study Center, School of Medicine, Yale University, New Haven, Connecticut 06511, USA
| | - Sean A Fulop
- Department of Linguistics, Fresno State University, Fresno, California 93740, USA
| | - Wei-Rong Chen
- Yale Child Study Center, School of Medicine, Yale University, New Haven, Connecticut 06511, USA
| | - D H Whalen
- Yale Child Study Center, School of Medicine, Yale University, New Haven, Connecticut 06511, USA
| |
Collapse
|
3
|
Tan T, Godin OA. Rapid emergence of empirical Green's functions from cross-correlations of ambient sound on continental shelfa). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:3784-3798. [PMID: 38109405 DOI: 10.1121/10.0023931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 11/16/2023] [Indexed: 12/20/2023]
Abstract
Applications of acoustic noise interferometry to passive remote sensing of the ocean rely on retrieval of empirical Green's functions (EGFs) from cross-correlations of ambient sound at spatially separated points. At ranges of tens of ocean depths, obtaining stable and accurate EGF estimates usually requires noise averaging periods of hours or days. Using data acquired in the Shallow Water 2006 experiment on the continental shelf off New Jersey, it is found that at ranges of 40-70 ocean depths, the EGFs can be retrieved with noise averaging times as short as 64 s. The phenomenon is observed for various receiver pairs but does not occur simultaneously in all azimuthal directions. The rapidly emerging EGFs have a wider frequency band and a richer normal mode content than the EGFs obtained in previous studies using long averaging times and are better suited for monitoring physical processes in the water column. Available acoustic and environmental data is examined to understand the conditions leading to rapid EGF emergence from diffuse noise. Strong intermittency is observed in the horizontal directionality of ambient sound. Rapid emergence of EGF in shallow-water waveguide is found to occur when the directionality of diffuse ambient noise is favorable.
Collapse
Affiliation(s)
- Tsuwei Tan
- Department of Marine Science, ROC Naval Academy, 813 Kaohsiung, Taiwan
| | - Oleg A Godin
- Department of Physics, Naval Postgraduate School, 833 Dyer Road, Monterey, California 93943-5216, USA
| |
Collapse
|
4
|
Abeysekara LL, Kolambahewage C, Pathirana PN, Horne M, Szmulewicz DJ, Corben LA. A Novel Feature from Instrumented Utensils for Clinical Assessment of Friedreich Ataxia. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-4. [PMID: 38083604 DOI: 10.1109/embc40787.2023.10340519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
Friedreich Ataxia (FRDA) is an inherited disorder that affects the cerebellum and other regions of the human nervous system. It causes impaired movement that affects quality and reduces lifespan. Clinical assessment of movement is a key part of diagnosis and assessment of severity. Recent studies have examined instrumented measurement of movement to support clinical assessments. This paper presents a frequency domain approach based on Average Band Power (ABP) estimation for clinical assessment using Inertial Measurement Unit (IMU) signals. The IMUs were attached to a 3D printed spoon and a cup. Participants used them to mimic eating and drinking activities during data collection. For both activities, the ABP of frequency components from individuals with FRDA clustered in 0 to 0.2Hz band. This suggests that the ABP of this frequency is affected by FRDA irrespective of the device or activity. The ABP in this frequency band was used to distinguish between FRDA and non-ataxic participants using the Area Under the Receiver-Operating-Characteristic Curve (AUC) which produced peak values greater than 0.8. The machine learning models (logistic regression and neural networks) produced accuracy greater than 80% with these features common to both devices.
Collapse
|
5
|
Cox SR, Huang T, Chen WR, Ng ML. An acoustic study of Cantonese alaryngeal speech in different speaking conditions. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:2973. [PMID: 37212513 PMCID: PMC10205142 DOI: 10.1121/10.0019471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 04/30/2023] [Accepted: 05/02/2023] [Indexed: 05/23/2023]
Abstract
Esophageal (ES) speech, tracheoesophageal (TE) speech, and the electrolarynx (EL) are common methods of communication following the removal of the larynx. Our recent study demonstrated that intelligibility may increase for Cantonese alaryngeal speakers using clear speech (CS) compared to their everyday "habitual speech" (HS), but the reasoning is still unclear [Hui, Cox, Huang, Chen, and Ng (2022). Folia Phoniatr. Logop. 74, 103-111]. The purpose of this study was to assess the acoustic characteristics of vowels and tones produced by Cantonese alaryngeal speakers using HS and CS. Thirty-one alaryngeal speakers (9 EL, 10 ES, and 12 TE speakers) read The North Wind and the Sun passage in HS and CS. Vowel formants, vowel space area (VSA), speaking rate, pitch, and intensity were examined, and their relationship to intelligibility were evaluated. Statistical models suggest that larger VSAs significantly improved intelligibility, but slower speaking rate did not. Vowel and tonal contrasts did not differ between HS and CS for all three groups, but the amount of information encoded in fundamental frequency and intensity differences between high and low tones positively correlated with intelligibility for TE and ES groups, respectively. Continued research is needed to understand the effects of different speaking conditions toward improving acoustic and perceptual characteristics of Cantonese alaryngeal speech.
Collapse
Affiliation(s)
- Steven R Cox
- Department of Communication Sciences and Disorders, Adelphi University, Garden City, New York 11530, USA
| | - Ting Huang
- Haskins Laboratories, New Haven, Connecticut 06511, USA
| | - Wei-Rong Chen
- Haskins Laboratories, New Haven, Connecticut 06511, USA
| | - Manwa L Ng
- Speech Science Laboratory, Faculty of Education, University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
6
|
Whalen DH, Chen WR, Shadle CH, Fulop SA. Formants are easy to measure; resonances, not so much: Lessons from Klatt (1986). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:933. [PMID: 36050157 PMCID: PMC9374483 DOI: 10.1121/10.0013410] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Formants in speech signals are easily identified, largely because formants are defined to be local maxima in the wideband sound spectrum. Sadly, this is not what is of most interest in analyzing speech; instead, resonances of the vocal tract are of interest, and they are much harder to measure. Klatt [(1986). in Proceedings of the Montreal Satellite Symposium on Speech Recognition, 12th International Congress on Acoustics, edited by P. Mermelstein (Canadian Acoustical Society, Montreal), pp. 5-7] showed that estimates of resonances are biased by harmonics while the human ear is not. Several analysis techniques placed the formant closer to a strong harmonic than to the center of the resonance. This "harmonic attraction" can persist with newer algorithms and in hand measurements, and systematic errors can persist even in large corpora. Research has shown that the reassigned spectrogram is less subject to these errors than linear predictive coding and similar measures, but it has not been satisfactorily automated, making its wider use unrealistic. Pending better techniques, the recommendations are (1) acknowledge limitations of current analyses regarding influence of F0 and limits on granularity, (2) report settings more fully, (3) justify settings chosen, and (4) examine the pattern of F0 vs F1 for possible harmonic bias.
Collapse
Affiliation(s)
- D H Whalen
- Haskins Laboratories, New Haven, Connecticut 06511, USA
| | - Wei-Rong Chen
- Haskins Laboratories, New Haven, Connecticut 06511, USA
| | | | - Sean A Fulop
- Department of Linguistics, California State University Fresno, Fresno, California 93740, USA
| |
Collapse
|
7
|
Joolee JB, Uddin MA, Jeon S. Deep multi-model fusion network based real object tactile understanding from haptic data. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03181-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
8
|
El-Gazar S, El-Shafai W, El-Banby G, F. A. Hamed H, M. Salama G, Abd-Elnaby M, E. Abd El-Samie F. Cancelable Speaker Identification System Based on Optical-Like Encryption Algorithms. COMPUTER SYSTEMS SCIENCE AND ENGINEERING 2022; 43:87-102. [DOI: 10.32604/csse.2022.022722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 09/30/2021] [Indexed: 09/01/2023]
|
9
|
Sainburg T, Gentner TQ. Toward a Computational Neuroethology of Vocal Communication: From Bioacoustics to Neurophysiology, Emerging Tools and Future Directions. Front Behav Neurosci 2021; 15:811737. [PMID: 34987365 PMCID: PMC8721140 DOI: 10.3389/fnbeh.2021.811737] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 11/29/2021] [Indexed: 11/23/2022] Open
Abstract
Recently developed methods in computational neuroethology have enabled increasingly detailed and comprehensive quantification of animal movements and behavioral kinematics. Vocal communication behavior is well poised for application of similar large-scale quantification methods in the service of physiological and ethological studies. This review describes emerging techniques that can be applied to acoustic and vocal communication signals with the goal of enabling study beyond a small number of model species. We review a range of modern computational methods for bioacoustics, signal processing, and brain-behavior mapping. Along with a discussion of recent advances and techniques, we include challenges and broader goals in establishing a framework for the computational neuroethology of vocal communication.
Collapse
Affiliation(s)
- Tim Sainburg
- Department of Psychology, University of California, San Diego, La Jolla, CA, United States
- Center for Academic Research & Training in Anthropogeny, University of California, San Diego, La Jolla, CA, United States
| | - Timothy Q. Gentner
- Department of Psychology, University of California, San Diego, La Jolla, CA, United States
- Neurosciences Graduate Program, University of California, San Diego, La Jolla, CA, United States
- Neurobiology Section, Division of Biological Sciences, University of California, San Diego, La Jolla, CA, United States
- Kavli Institute for Brain and Mind, University of California, San Diego, La Jolla, CA, United States
| |
Collapse
|
10
|
Shiels TA, Oxley TJ, Fitzgerald PB, Opie NL, Wong YT, Grayden DB, John SE. Feasibility of using discrete Brain Computer Interface for people with Multiple Sclerosis. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2021; 2021:5686-5689. [PMID: 34892412 DOI: 10.1109/embc46164.2021.9629518] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
AIM Brain-Computer Interfaces (BCIs) hold promise to provide people with partial or complete paralysis, the ability to control assistive technology. This study reports offline classification of imagined and executed movements of the upper and lower limb in one participant with multiple sclerosis and people with no limb function deficits. METHODS We collected neural signals using electroencephalography (EEG) while participants performed executed and imagined motor tasks as directed by prompts shown on a screen. RESULTS Participants with no limb function attained >70% decoding accuracy on their best-imagined task compared to rest and on at-least one task comparison. The participant with multiple sclerosis also achieved accuracies within the range of participants with no limb function loss.Clinical Relevance - While only one case study is provided it was promising that the participant with MS was able to achieve comparable classification to that of the seven healthy controls. Further studies are needed to assess whether people suffering from MS may be able to use a BCI to improve their quality of life.
Collapse
|
11
|
Ando S. Time-frequency representation with variant array of frequency-domain Prony estimators. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:2682. [PMID: 34717458 DOI: 10.1121/10.0006539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Accepted: 09/13/2021] [Indexed: 06/13/2023]
Abstract
The frequency-domain Prony method (FDPM) [Ando, IEEE Trans. Signal Process. 68, 3461-3470 (2020)] provides a novel and exact short-time, frequency-decomposed scheme for autoregressive model identification and sinusoidal parameter estimation with a superior statistical performance. By using it as localized estimation elements, we construct the time-frequency representation (TFR) of signals as the frequency-reassigned map of the damped sinusoidal parameters of their components. The FDPM for both single and multiple sinusoids is based on a small number of windowless Fourier coefficients of sampled sequence. Thus, a unified and flexible construction of resolution and decomposition structures including linear and log-linear frequency arrays and their combination is possible, and dense analysis along the time axis can be implemented without a significant increase in computational cost. Conditions for constructing the frequency-variant arrays are formulated. Two cooperative behaviors in the TFR are considered to find stable traces of frequencies and rapidly time-varying incidences and components. Several experiments are shown to confirm extended features and performances of the proposed TFR using musical, speech, and natural sound signals.
Collapse
Affiliation(s)
- Shigeru Ando
- University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| |
Collapse
|
12
|
Tan TW, Godin OA. Passive acoustic characterization of sub-seasonal sound speed variations in a coastal ocean. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:2717. [PMID: 34717456 DOI: 10.1121/10.0006664] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Accepted: 09/21/2021] [Indexed: 06/13/2023]
Abstract
Acoustic noise interferometry is applied to retrieve empirical Green's functions (EGFs) from the ambient and shipping noise data acquired in the Shallow Water 2006 experiment on the continental shelf off New Jersey. Despite strong internal wave-induced perturbations of the sound speed in water, EGFs are found on 31 acoustic paths by cross-correlating the noise recorded on a single hydrophone with noise on the hydrophones of a horizontal linear array about 3.6 km away. Datasets from two non-overlapping 15-day observation periods are considered. Dispersion curves of three low-order normal modes at frequencies below 110 Hz are extracted from the EGFs with the time-warping technique. The dispersion curves from the first dataset were previously employed to estimate the seabed properties. Here, using this seabed model, we invert the differences between the dispersion curves obtained from the two datasets for the variation of the time-averaged sound speed profile (SSP) in water between the two observation periods. Results of the passive SSP inversion of the second dataset are compared with the ground truth derived from in situ temperature measurements. The effect of temporal variability of the water column during noise-averaging time on EGF retrieval is discussed and quantified.
Collapse
Affiliation(s)
- Tsu Wei Tan
- Department of Marine Science, ROC Naval Academy, 813 Kaohsiung, Taiwan
| | - Oleg A Godin
- Department of Physics, Naval Postgraduate School, 833 Dyer Road, Monterey, California 93943-5216, USA
| |
Collapse
|
13
|
Multi-modal physiological signals based fear of heights analysis in virtual reality scenes. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2021.102988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
14
|
Averbuch G. The spectrogram, method of reassignment, and frequency-domain beamforming. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:747. [PMID: 33639804 DOI: 10.1121/10.0003384] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 01/06/2021] [Indexed: 05/21/2023]
Abstract
A smeared spectrogram is a result of the smoothing kernel in the short-time Fourier-transform (STFT). Besides the smeared energy, time and frequency phase information is also smeared, i.e., spectral components may contain imprecise phase information. The STFT is also used as the basis for more advanced signal processing techniques such as frequency-domain beamforming and cross correlation (CC). Both methods seek the delay time between signals by exploring phase-shifts in the frequency domain. Due to the inexact phase information in some of the time-frequency elements, their phase shifts are incorrect. This study re-introduces the reassigned spectrogram (RS) as a measure to fix the STFT artifacts. Moreover, it is shown that by using the RS, phase shifts can be optimized and improve beamforming and CC results. Synthetic and recorded data are used to show the advantage of using the RS in time-frequency analysis, CC, and beamforming. Results show that, subject to certain constraints, the RS provides exact time-frequency representation of deterministic signals and significantly improve CC and beamforming results. Array analysis of infrasonic signals shows that better results are obtained by either the RS- or STFT-based analysis depending on the signals' spectral components and noise levels.
Collapse
Affiliation(s)
- Gil Averbuch
- Roy M. Huffington Department of Earth Sciences, Southern Methodist University, Dallas, Texas 75205, USA
| |
Collapse
|
15
|
Alagumariappan P, Krishnamurthy K, Kandiah S, Cyril E, V R. Diagnosis of Type 2 Diabetes Using Electrogastrograms: Extraction and Genetic Algorithm–Based Selection of Informative Features. JMIR BIOMEDICAL ENGINEERING 2020. [DOI: 10.2196/20932] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Background
Electrogastrography is a noninvasive electrophysiological procedure used to measure gastric myoelectrical activity. EGG methods have been used to investigate the mechanisms of the human digestive system and as a clinical tool. Abnormalities in gastric myoelectrical activity have been observed in subjects with diabetes.
Objective
The objective of this study was to use the electrogastrograms (EGGs) from healthy individuals and subjects with diabetes to identify potentially informative features for the diagnosis of diabetes using EGG signals.
Methods
A total of 30 features were extracted from the EGGs of 30 healthy individuals and 30 subjects with diabetes. Of these, 20 potentially informative features were selected using a genetic algorithm–based feature selection process. The selected features were analyzed for further classification of EGG signals from healthy individuals and subjects with diabetes.
Results
This study demonstrates that there are distinct variations between the EGG signals recorded from healthy individuals and those from subjects with diabetes. Furthermore, the study reveals that the features Maragos fractal dimension and Hausdorff box-counting fractal dimension have a high degree of correlation with the mobility of EGGs from healthy individuals and subjects with diabetes.
Conclusions
Based on the analysis on the extracted features, the selected features are suitable for the design of automated classification systems to identify healthy individuals and subjects with diabetes.
Collapse
|
16
|
Shin HW, Kim HJ, Jang YK, You HS, Huh H, Choi YJ, Choi SU, Hong JS. Monitoring of anesthetic depth and EEG band power using phase lag entropy during propofol anesthesia. BMC Anesthesiol 2020; 20:49. [PMID: 32102676 PMCID: PMC7045415 DOI: 10.1186/s12871-020-00964-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2019] [Accepted: 02/19/2020] [Indexed: 12/18/2022] Open
Abstract
Background Phase lag entropy (PLE) is a novel anesthetic depth indicator that uses four-channel electroencephalography (EEG) to measure the temporal pattern diversity in the phase relationship of frequency signals in the brain. The purpose of the study was to evaluate the anesthetic depth monitoring using PLE and to evaluate the correlation between PLE and bispectral index (BIS) values during propofol anesthesia. Methods In thirty-five adult patients undergoing elective surgery, anesthesia was induced with propofol using target-controlled infusion (the Schneider model). We recorded the PLE value, raw EEG, BIS value, and hemodynamic data when the target effect-site concentration (Ce) of propofol reached 2, 3, 4, 5, and 6 μg/ml before intubation and 6, 5, 4, 3, 2 μg/ml after intubation and injection of muscle relaxant. We analyzed whether PLE and raw EEG data from the PLE monitor reflected the anesthetic depth as the Ce of propofol changed, and whether PLE values were comparable to BIS values. Results PLE values were inversely correlated to changes in propofol Ce (propofol Ce from 0 to 6.0 μg/ml, r2 = − 0.83; propofol Ce from 6.0 to 2.0 μg/ml, r2 = − 0.46). In the spectral analysis of EEG acquired from the PLE monitor, the persistence spectrogram revealed a wide distribution of power at loss of consciousness (LOC) and recovery of consciousness (ROC), with a narrow distribution during unconsciousness. The power spectrogram showed the typical pattern seen in propofol anesthesia with slow alpha frequency band oscillation. The PLE value demonstrated a strong correlation with the BIS value during the change in propofol Ce from 0 to 6.0 μg/ml (r2 = 0.84). PLE and BIS values were similar at LOC (62.3 vs. 61.8) (P > 0.05), but PLE values were smaller than BIS values at ROC (64.4 vs 75.7) (P < 0.05). Conclusions The PLE value is a useful anesthetic depth indicator, similar to the BIS value, during propofol anesthesia. Spectral analysis of EEG acquired from the PLE monitor demonstrated the typical patterns seen in propofol anesthesia. Trial registration This clinical trial was retrospectively registered at ClinicalTrials.gov at October 2017 (NCT03299621).
Collapse
Affiliation(s)
- Hye Won Shin
- Department of Anesthesiology and Pain Medicine, Korea University Anam Hospital, College of Medicine, Korea University, Goryodae-ro 73, Seongbuk-gu, 02841, Seoul, Republic of Korea.
| | - Hyun Jung Kim
- Department of Anesthesiology and Pain Medicine, Ewha University Magok Hospital, College of Medicine, Ewha University, Seoul, Republic of Korea
| | - Yoo Kyung Jang
- Department of Anesthesiology and Pain Medicine, Korea University Anam Hospital, College of Medicine, Korea University, Goryodae-ro 73, Seongbuk-gu, 02841, Seoul, Republic of Korea
| | - Hae Sun You
- Department of Anesthesiology and Pain Medicine, Korea University Anam Hospital, College of Medicine, Korea University, Goryodae-ro 73, Seongbuk-gu, 02841, Seoul, Republic of Korea
| | - Hyub Huh
- Department of Anesthesiology and Pain Medicine, Korea University Anam Hospital, College of Medicine, Korea University, Goryodae-ro 73, Seongbuk-gu, 02841, Seoul, Republic of Korea
| | - Yoon Ji Choi
- Department of Anesthesiology and Pain Medicine, Korea University Ansan Hospital, College of Medicine, Korea University, Gyeonggi-do, Republic of Korea
| | - Seung Uk Choi
- Department of Anesthesiology and Pain Medicine, Korea University Anam Hospital, College of Medicine, Korea University, Goryodae-ro 73, Seongbuk-gu, 02841, Seoul, Republic of Korea
| | - Ji Su Hong
- Department of Anesthesiology and Pain Medicine, Korea University Anam Hospital, College of Medicine, Korea University, Goryodae-ro 73, Seongbuk-gu, 02841, Seoul, Republic of Korea
| |
Collapse
|
17
|
Ahmedt-Aristizabal D, Sarfraz MS, Denman S, Nguyen K, Fookes C, Dionisio S, Stiefelhagen R. Motion Signatures for the Analysis of Seizure Evolution in Epilepsy. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2020; 2019:2099-2105. [PMID: 31946315 DOI: 10.1109/embc.2019.8857743] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
In epilepsy, semiology refers to the study of patient behavior and movement, and their temporal evolution during epileptic seizures. Understanding semiology provides clues to the cerebral networks underpinning the epileptic episode and is a vital resource in the pre-surgical evaluation. Recent advances in video analytics have been helpful in capturing and quantifying epileptic seizures. Nevertheless, the automated representation of the evolution of semiology, as examined by neurologists, has not been appropriately investigated. From initial seizure symptoms until seizure termination, motion patterns of isolated clinical manifestations vary over time. Furthermore, epileptic seizures frequently evolve from one clinical manifestation to another, and their understanding cannot be overlooked during a presurgery evaluation. Here, we propose a system capable of computing motion signatures from videos of face and hand semiology to provide quantitative information on the motion, and the correlation between motions. Each signature is derived from a sparse saliency representation established by the magnitude of the optical flow field. The developed computer-aided tool provides a novel approach for physicians to analyze semiology as a flow of signals without interfering in the healthcare environment. We detect and quantify semiology using detectors based on deep learning and via a novel signature scheme, which is independent of the amount of data and seizure differences. The system reinforces the benefits of computer vision for non-obstructive clinical applications to quantify epileptic seizures recorded in real-life healthcare conditions.
Collapse
|
18
|
Siddharth S, Trivedi MM. On Assessing Driver Awareness of Situational Criticalities: Multi-modal Bio-Sensing and Vision-Based Analysis, Evaluations, and Insights. Brain Sci 2020; 10:E46. [PMID: 31952156 PMCID: PMC7016967 DOI: 10.3390/brainsci10010046] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 01/10/2020] [Accepted: 01/10/2020] [Indexed: 11/18/2022] Open
Abstract
Automobiles for our roadways are increasingly using advanced driver assistance systems. The adoption of such new technologies requires us to develop novel perception systems not only for accurately understanding the situational context of these vehicles, but also to infer the driver's awareness in differentiating between safe and critical situations. This manuscript focuses on the specific problem of inferring driver awareness in the context of attention analysis and hazardous incident activity. Even after the development of wearable and compact multi-modal bio-sensing systems in recent years, their application in driver awareness context has been scarcely explored. The capability of simultaneously recording different kinds of bio-sensing data in addition to traditionally employed computer vision systems provides exciting opportunities to explore the limitations of these sensor modalities. In this work, we explore the applications of three different bio-sensing modalities namely electroencephalogram (EEG), photoplethysmogram (PPG) and galvanic skin response (GSR) along with a camera-based vision system in driver awareness context. We assess the information from these sensors independently and together using both signal processing- and deep learning-based tools. We show that our methods outperform previously reported studies to classify driver attention and detecting hazardous/non-hazardous situations for short time scales of two seconds. We use EEG and vision data for high resolution temporal classification (two seconds) while additionally also employing PPG and GSR over longer time periods. We evaluate our methods by collecting user data on twelve subjects for two real-world driving datasets among which one is publicly available (KITTI dataset) while the other was collected by us (LISA dataset) with the vehicle being driven in an autonomous mode. This work presents an exhaustive evaluation of multiple sensor modalities on two different datasets for attention monitoring and hazardous events classification.
Collapse
Affiliation(s)
- Siddharth Siddharth
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Mohan M Trivedi
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
19
|
Yang Z, Zhuang X, Sreenivasan K, Mishra V, Curran T, Cordes D. A robust deep neural network for denoising task-based fMRI data: An application to working memory and episodic memory. Med Image Anal 2019; 60:101622. [PMID: 31811979 DOI: 10.1016/j.media.2019.101622] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Revised: 10/10/2019] [Accepted: 11/25/2019] [Indexed: 01/21/2023]
Abstract
In this study, a deep neural network (DNN) is proposed to reduce the noise in task-based fMRI data without explicitly modeling noise. The DNN artificial neural network consists of one temporal convolutional layer, one long short-term memory (LSTM) layer, one time-distributed fully-connected layer, and one unconventional selection layer in sequential order. The LSTM layer takes not only the current time point but also what was perceived in a previous time point as its input to characterize the temporal autocorrelation of fMRI data. The fully-connected layer weights the output of the LSTM layer, and the output denoised fMRI time series is selected by the selection layer. Assuming that task-related neural response is limited to gray matter, the model parameters in the DNN network are optimized by maximizing the correlation difference between gray matter voxels and white matter or ventricular cerebrospinal fluid voxels. Instead of targeting a particular noise source, the proposed neural network takes advantage of the task design matrix to better extract task-related signal in fMRI data. The DNN network, along with other traditional denoising techniques, has been applied on simulated data, working memory task fMRI data acquired from a cohort of healthy subjects and episodic memory task fMRI data acquired from a small set of healthy elderly subjects. Qualitative and quantitative measurements were used to evaluate the performance of different denoising techniques. In the simulation, DNN improves fMRI activation detection and also adapts to varying hemodynamic response functions across different brain regions. DNN efficiently reduces physiological noise and generates more homogeneous task-response correlation maps in real data.
Collapse
Affiliation(s)
- Zhengshi Yang
- Cleveland Clinic Lou Ruvo Center for Brain Health, Las Vegas, NV, 89106, USA
| | - Xiaowei Zhuang
- Cleveland Clinic Lou Ruvo Center for Brain Health, Las Vegas, NV, 89106, USA
| | - Karthik Sreenivasan
- Cleveland Clinic Lou Ruvo Center for Brain Health, Las Vegas, NV, 89106, USA
| | - Virendra Mishra
- Cleveland Clinic Lou Ruvo Center for Brain Health, Las Vegas, NV, 89106, USA
| | - Tim Curran
- Department of Psychology and Neuroscience, University of Colorado, Boulder, CO, 80309, USA
| | - Dietmar Cordes
- Cleveland Clinic Lou Ruvo Center for Brain Health, Las Vegas, NV, 89106, USA; Department of Psychology and Neuroscience, University of Colorado, Boulder, CO, 80309, USA.
| |
Collapse
|
20
|
Atlan LS, Margulies SS. Frequency-Dependent Changes in Resting State Electroencephalogram Functional Networks after Traumatic Brain Injury in Piglets. J Neurotrauma 2019; 36:2558-2578. [PMID: 30909806 PMCID: PMC6709726 DOI: 10.1089/neu.2017.5574] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Traumatic brain injury (TBI) is a major health concern in children, as it can cause chronic cognitive and behavioral deficits. The lack of objective involuntary metrics for the diagnosis of TBI makes prognosis more challenging, especially in the pediatric context, in which children are often unable to articulate their symptoms. Resting state electroencephalograms (EEG), which are inexpensive and non-invasive, and do not require subjects to perform cognitive tasks, have not yet been used to create functional brain networks in relation to TBI in children or non-human animals; here we report the first such study. We recorded resting state EEG in awake piglets before and after TBI, from which we generated EEG functional networks from the alpha (8-12 Hz), beta (16.5-25 Hz), broad (1-35 Hz), delta (1-3.5 Hz), gamma (30-35 Hz), sigma (13-16 Hz), and theta (4-7.5 Hz) frequency bands. We hypothesize that mild TBI will induce persistent frequency-dependent changes in the 4-week-old piglet at acute and chronic time points. Hyperconnectivity was found in several frequency band networks after TBI. This study serves as proof of concept that the study of EEG functional networks in awake piglets may be useful for the development of diagnostic metrics for TBI in children.
Collapse
Affiliation(s)
- Lorre S. Atlan
- Department of Bioengineering, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Susan S. Margulies
- Department of Bioengineering, University of Pennsylvania, Philadelphia, Pennsylvania
| |
Collapse
|
21
|
Yang Z, Zhuang X, Bird C, Sreenivasan K, Mishra V, Banks S, Cordes D. Performing Sparse Regularization and Dimension Reduction Simultaneously in Multimodal Data Fusion. Front Neurosci 2019; 13:642. [PMID: 31333396 PMCID: PMC6618346 DOI: 10.3389/fnins.2019.00642] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Accepted: 06/04/2019] [Indexed: 01/28/2023] Open
Abstract
Collecting multiple modalities of neuroimaging data on the same subject is increasingly becoming the norm in clinical practice and research. Fusing multiple modalities to find related patterns is a challenge in neuroimaging analysis. Canonical correlation analysis (CCA) is commonly used as a symmetric data fusion technique to find related patterns among multiple modalities. In CCA-based data fusion, principal component analysis (PCA) is frequently applied as a preprocessing step to reduce data dimension followed by CCA on dimension-reduced data. PCA, however, does not differentiate between informative voxels from non-informative voxels in the dimension reduction step. Sparse PCA (sPCA) extends traditional PCA by adding sparse regularization that assigns zero weights to non-informative voxels. In this study, sPCA is incorporated into CCA-based fusion analysis and applied on neuroimaging data. A cross-validation method is developed and validated to optimize the parameters in sPCA. Different simulations are carried out to evaluate the improvement by introducing sparsity constraint to PCA. Four fusion methods including sPCA+CCA, PCA+CCA, parallel ICA and sparse CCA were applied on structural and functional magnetic resonance imaging data of mild cognitive impairment subjects and normal controls. Our results indicate that sPCA significantly can reduce the impact of non-informative voxels and lead to improved statistical power in uncovering disease-related patterns by a fusion analysis.
Collapse
Affiliation(s)
- Zhengshi Yang
- Cleveland Clinic Lou Ruvo Center for Brain Health, Las Vegas, NV, United States
| | - Xiaowei Zhuang
- Cleveland Clinic Lou Ruvo Center for Brain Health, Las Vegas, NV, United States
| | - Christopher Bird
- Cleveland Clinic Lou Ruvo Center for Brain Health, Las Vegas, NV, United States
| | - Karthik Sreenivasan
- Cleveland Clinic Lou Ruvo Center for Brain Health, Las Vegas, NV, United States
| | - Virendra Mishra
- Cleveland Clinic Lou Ruvo Center for Brain Health, Las Vegas, NV, United States
| | - Sarah Banks
- Cleveland Clinic Lou Ruvo Center for Brain Health, Las Vegas, NV, United States
| | - Dietmar Cordes
- Cleveland Clinic Lou Ruvo Center for Brain Health, Las Vegas, NV, United States
- Departments of Psychology and Neuroscience, University of Colorado, Boulder, CO, United States
| |
Collapse
|
22
|
Chen WR, Whalen DH, Shadle CH. F0-induced formant measurement errors result in biased variabilities. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:EL360. [PMID: 31153348 PMCID: PMC6909981 DOI: 10.1121/1.5103195] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Revised: 04/18/2019] [Accepted: 04/22/2019] [Indexed: 05/21/2023]
Abstract
Many developmental studies attribute reduction of acoustic variability to increasing motor control. However, linear prediction-based formant measurements are known to be biased toward the nearest harmonic of F0, especially at high F0s. Thus, the amount of reported formant variability generated by changes in F0 is unknown. Here, 470 000 vowels were synthesized, mimicking statistics reported in four developmental studies, to estimate the proportion of formant variability that can be attributed to F0 bias, as well as other formant measurement errors. Results showed that the F0-induced formant measurements errors are large and systematic, and cannot be eliminated by a large sample size.
Collapse
Affiliation(s)
- Wei-Rong Chen
- Haskins Laboratories, 300 George Street, New Haven, Connecticut 06511, , ,
| | - D H Whalen
- Haskins Laboratories, 300 George Street, New Haven, Connecticut 06511, , ,
| | - Christine H Shadle
- Haskins Laboratories, 300 George Street, New Haven, Connecticut 06511, , ,
| |
Collapse
|
23
|
Abstract
The improvement of the readability of time-frequency transforms is an important topic in the field of fast-oscillating signal processing. The reassignment method is often used due to its adaptivity to different transforms and nice formal properties. However, it strongly depends on the selection of the analysis window and it requires the computation of the same transform using three different but well-defined windows. The aim of this work is to provide a simple method for spectrogram reassignment, named FIRST (Fast Iterative and Robust Reassignment Thinning), with comparable or better precision than classical reassignment method, a reduced computational effort, and a near independence of the adopted analysis window. To this aim, the time-frequency evolution of a multicomponent signal is formally provided and, based on this law, only a subset of time-frequency points is used to improve spectrogram readability. Those points are the ones less influenced by interfering components. Preliminary results show that the proposed method can efficiently reassign spectrograms more accurately than the classical method in the case of interfering signal components, with a significant gain in terms of required computational effort.
Collapse
|
24
|
Female resistance and harmonic convergence influence male mating success in Aedes aegypti. Sci Rep 2019; 9:2145. [PMID: 30765779 PMCID: PMC6375921 DOI: 10.1038/s41598-019-38599-3] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Accepted: 12/21/2018] [Indexed: 01/01/2023] Open
Abstract
Despite the importance of mosquito mating biology to reproductive control strategies, a mechanistic understanding of individual mating interactions is currently lacking. Using synchronised high-speed video and audio recordings, we quantified behavioural and acoustic features of mating attempts between tethered female and free-flying male Aedes aegypti. In most couplings, males were actively displaced by female kicks in the early phases of the interaction, while flight cessation prior to adoption of the pre-copulatory mating pose also inhibited copulation. Successful males were kicked at a reduced rate and sustained paired contact-flight for longer than those that were rejected. We identified two distinct phases of acoustic interaction. Rapid frequency modulation of flight tones was observed in all interactions up to acceptance of the male. Harmonic convergence (wingbeat frequency matching) was detected more often in successful attempts, coinciding with the transition to stabilised paired flight and subsequent genital contact. Our findings provide a clearer understanding of the relationship between acoustic interactions and mating performance in mosquitoes, offering insights which may be used to target improvements in laboratory reared lines.
Collapse
|
25
|
Ferreira VG, Montecino HC, Ndehedehe CE, Heck B, Gong Z, de Freitas SRC, Westerhaus M. Space-based observations of crustal deflections for drought characterization in Brazil. THE SCIENCE OF THE TOTAL ENVIRONMENT 2018; 644:256-273. [PMID: 29981974 DOI: 10.1016/j.scitotenv.2018.06.277] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2018] [Revised: 06/19/2018] [Accepted: 06/22/2018] [Indexed: 06/08/2023]
Abstract
Widespread environmental impacts of frequent drought episodes in Brazil have resulted in several drought-related diagnostics studies. However, the potential of many "opportunistic sensors", such as the Global Positioning System (GPS), has not yet been considered in hydrological hazard monitoring in Brazil. In this study, the response of the Earth's crust to Brazil's 2012-2015 drought event in different structural provinces is analyzed by comparing GPS-observed vertical crustal deformations (VCDs) with the terrestrial water storage (TWS) derived from the Gravity Recovery and Climate Experiment (GRACE). The results indicate that there is no spatial correlation between annual amplitudes of the TWS and VCDs in different structural provinces apart from the purely elastic response of the crust to TWS dynamics, at almost all the 39 GPS stations that were analyzed. However, approximately 15% of the monitoring stations show that VCD leads TWS with a phase lag of 2-4 months. Errors associated with VCD and TWS are within the accepted range for space geodetic techniques (i.e., GPS and GRACE) and despite the need for further investigation, the phase lead seems to be associated with rainfall, which impacts the TWS through the hydrographs. Overall, the GPS-based drought index (DIVCD) reflects the water depletion in many regions of Brazil, which agrees with the GRACE-based DITWS in terms of the Spearman correlation coefficient (ranging from 0.4 to 0.9) in the Amazon, Tocantins, La Plata, and São Francisco river basins. This agreement confirms the drought persistence during the study period and that DIVCD can be used to monitor hydrological droughts. In regions in which DITWS sufficiently agrees with DIVCD (48% of the sites), near real-time drought monitoring is feasible. This could be useful in the optimization of models for the forward prediction of drought events in other regions worldwide, where GPS vertical displacements strongly correlate with hydrological GRACE signals.
Collapse
Affiliation(s)
- V G Ferreira
- School of Earth Sciences and Engineering, Hohai University, Nanjing 211100, China
| | - H C Montecino
- Department of Geodesy Science and Geomatics, University of Concepción, Los Angeles 4451032, Chile
| | - C E Ndehedehe
- Australian Rivers Institute and Griffith School of Environment & Science, Griffith University, Nathan, Queensland 4111, Australia
| | - B Heck
- Geodetic Institute of Karlsruhe, Karlsruhe Institute of Technology, Karlsruhe 76128, Germany
| | - Z Gong
- State Key Laboratory of Hydrology-Water Resources and Hydraulic Engineering, Hohai University, 1st 1 Xi Kang Lu, Nanjing 210098, Jiangsu, China.
| | - S R C de Freitas
- Geodetic Sciences Graduation Course, Federal University of Paraná, Curitiba 81.531-990, Brazil
| | - M Westerhaus
- Geodetic Institute of Karlsruhe, Karlsruhe Institute of Technology, Karlsruhe 76128, Germany
| |
Collapse
|
26
|
|
27
|
Oliveira BL, Godinho D, O'Halloran M, Glavin M, Jones E, Conceição RC. Diagnosing Breast Cancer with Microwave Technology: remaining challenges and potential solutions with machine learning. Diagnostics (Basel) 2018; 8:E36. [PMID: 29783760 PMCID: PMC6023429 DOI: 10.3390/diagnostics8020036] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2018] [Revised: 05/15/2018] [Accepted: 05/16/2018] [Indexed: 11/28/2022] Open
Abstract
Currently, breast cancer often requires invasive biopsies for diagnosis, motivating researchers to design and develop non-invasive and automated diagnosis systems. Recent microwave breast imaging studies have shown how backscattered signals carry relevant information about the shape of a tumour, and tumour shape is often used with current imaging modalities to assess malignancy. This paper presents a comprehensive analysis of microwave breast diagnosis systems which use machine learning to learn characteristics of benign and malignant tumours. The state-of-the-art, the main challenges still to overcome and potential solutions are outlined. Specifically, this work investigates the benefit of signal pre-processing on diagnostic performance, and proposes a new set of extracted features that capture the tumour shape information embedded in a signal. This work also investigates if a relationship exists between the antenna topology in a microwave system and diagnostic performance. Finally, a careful machine learning validation methodology is implemented to guarantee the robustness of the results and the accuracy of performance evaluation.
Collapse
Affiliation(s)
- Bárbara L Oliveira
- Electrical and Electronic Engineering, National University of Ireland Galway, Galway H91 TK33, Ireland.
| | - Daniela Godinho
- Instituto de Biofísica e Engenharia Biomédica, Faculdade de Ciências da Universidade de Lisboa, Campo Grande, 1749-016 Lisboa, Portugal.
| | - Martin O'Halloran
- Translational Medical Device Lab, National University of Ireland Galway, Galway H91 TK33, Ireland.
| | - Martin Glavin
- Electrical and Electronic Engineering, National University of Ireland Galway, Galway H91 TK33, Ireland.
| | - Edward Jones
- Electrical and Electronic Engineering, National University of Ireland Galway, Galway H91 TK33, Ireland.
| | - Raquel C Conceição
- Instituto de Biofísica e Engenharia Biomédica, Faculdade de Ciências da Universidade de Lisboa, Campo Grande, 1749-016 Lisboa, Portugal.
| |
Collapse
|
28
|
Schlichting S, Willemsen T, Ehlers H, Morgner U, Ristau D. Fourier-transform spectral interferometry for in situ group delay dispersion monitoring of thin film coating processes. OPTICS EXPRESS 2016; 24:22516-22527. [PMID: 27828322 DOI: 10.1364/oe.24.022516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
A fast Fourier-based measurement system to determine phase, group delay, and group delay dispersion during optical coating processes is proposed. The in situ method is based on a Michelson interferometer with a broad band light source and a very fast spectrometer. To our knowledge, group delay dispersion measurements directly on the moving substrates during a deposition process for complex interference coatings have been demonstrated for the first time. Especially for the very precise production of chirped mirrors it is advantageous to get information about the phase properties of the grown layer stack to react on errors and retrieve more information about the coating process.
Collapse
|
29
|
Mill RW, Brown GJ. Utilising temporal signal features in adverse noise conditions: Detection, estimation, and the reassigned spectrogram. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 139:904-917. [PMID: 26936571 DOI: 10.1121/1.4941566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Visual displays in passive sonar based on the Fourier spectrogram are underpinned by detection models that rely on signal and noise power statistics. Time-frequency representations specialised for sparse signals achieve a sharper signal representation, either by reassigning signal energy based on temporal structure or by conveying temporal structure directly. However, temporal representations involve nonlinear transformations that make it difficult to reason about how they respond to additive noise. This article analyses the effect of noise on temporal fine structure measurements such as zero crossings and instantaneous frequency. Detectors that rely on zero crossing intervals, intervals and peak amplitudes, and instantaneous frequency measurements are developed, and evaluated for the detection of a sinusoid in Gaussian noise, using the power detector as a baseline. Detectors that rely on fine structure outperform the power detector under certain circumstances; and detectors that rely on both fine structure and power measurements are superior. Reassigned spectrograms assume that the statistics used to reassign energy are reliable, but the derivation of the fine structure detectors indicates the opposite. The article closes by proposing and demonstrating the concept of a doubly reassigned spectrogram, wherein temporal measurements are reassigned according to a statistical model of the noise background.
Collapse
Affiliation(s)
- Robert W Mill
- Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield, S1 4DP, United Kingdom
| | - Guy J Brown
- Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield, S1 4DP, United Kingdom
| |
Collapse
|
30
|
Luo J, Wiegrebe L. Biomechanical control of vocal plasticity in an echolocating bat. ACTA ACUST UNITED AC 2016; 219:878-86. [PMID: 26823102 DOI: 10.1242/jeb.134957] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Accepted: 01/14/2016] [Indexed: 11/20/2022]
Abstract
Many animal species adjust the spectral composition of their acoustic signals to variable environments. However, the physiological foundation of such spectral plasticity is often unclear. The source-filter theory of sound production, initially established for human speech, applies to vocalizations in birds and mammals. According to this theory, adjusting the spectral structure of vocalizations could be achieved by modifying either the laryngeal/syringeal source signal or the vocal tract, which filters the source signal. Here, we show that in pale spear-nosed bats, spectral plasticity induced by moderate level background noise is dominated by the vocal tract rather than the laryngeal source signal. Specifically, we found that with increasing background noise levels, bats consistently decreased the spectral centroid of their echolocation calls up to 3.2 kHz, together with other spectral parameters. In contrast, noise-induced changes in fundamental frequency were small (maximally 0.1 kHz) and were inconsistent across individuals. Changes in spectral centroid did not correlate with changes in fundamental frequency, whereas they correlated negatively with changes in call amplitude. Furthermore, while bats consistently increased call amplitude with increasing noise levels (the Lombard effect), increases in call amplitude typically did not lead to increases in fundamental frequency. In summary, our results suggest that at least to a certain degree echolocating bats are capable of adjusting call amplitude, fundamental frequency and spectral parameters independently.
Collapse
Affiliation(s)
- Jinhong Luo
- Max Planck Institute for Ornithology, Acoustic and Functional Ecology Group, Eberhard-Gwinner-Straße, Seewiesen 82319, Germany Division of Neurobiology, Department Biology II, Ludwig-Maximilians-Universität München, Großhaderner Straße 2, Planegg-Martinsried 82152, Germany
| | - Lutz Wiegrebe
- Division of Neurobiology, Department Biology II, Ludwig-Maximilians-Universität München, Großhaderner Straße 2, Planegg-Martinsried 82152, Germany
| |
Collapse
|
31
|
Ballard MS, Frisk GV, Becker KM. Estimates of the temporal and spatial variability of ocean sound speed on the New Jersey shelf. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 135:3316-3326. [PMID: 24907795 DOI: 10.1121/1.4875715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Estimates of the spatial and temporal variability of ocean sound speed on the New Jersey shelf were obtained using acoustic signals measured by a set of freely drifting buoys. The range- and time-dependent inversion problem is computationally intensive and a linearized perturbative algorithm was applied to obtain results in an efficient manner. The inversion algorithm uses estimates of modal travel time to determine sound speed as a function of range and depth. In order to handle the high volume of data associated with the acoustic sensing network, the modal travel time estimation process was automated using an adaptive time-frequency signal processing method known as time-warping. Time-warping is a model-based transform that converts the frequency-dependent modal arrivals to monotones in the warped domain where they can be easily filtered. The data analyzed in this paper were collected on 16 March 2011 on the New Jersey shelf when the ocean was relatively well-mixed. While the observed sound-speed variations are small, both spatial and temporal trends are observed in the results. Furthermore, the estimated sound-speed profiles show good agreement with temporally and spatially collocated measurements.
Collapse
Affiliation(s)
- Megan S Ballard
- Applied Research Laboratories, The University of Texas at Austin, Austin, Texas 78713
| | - George V Frisk
- Department of Ocean and Mechanical Engineering, Florida Atlantic University, Dania Beach, Florida 33004
| | - Kyle M Becker
- Applied Research Laboratory, Pennsylvania State University, State College, Pennsylvania 16804
| |
Collapse
|
32
|
Etchemendy PE, Eguia MC, Mesz B. Principal pitch of frequency-modulated tones with asymmetrical modulation waveform: a comparison of models. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 135:1344-1355. [PMID: 24606273 DOI: 10.1121/1.4863649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
In this work, the overall perceived pitch (principal pitch) of pure tones modulated in frequency with an asymmetric waveform is studied. The dependence of the principal pitch on the degree of asymmetric modulation was obtained from a psychophysical experiment. The modulation waveform consisted of a flat portion of constant frequency and two linear segments forming a peak. Consistent with previous results, significant pitch shifts with respect to the time-averaged geometric mean were observed. The direction of the shifts was always toward the flat portion of the modulation. The results from the psychophysical experiment, along with those obtained from previously reported studies, were compared with the predictions of six models of pitch perception proposed in the literature. Even though no single model was able to predict accurately the perceived pitch for all experiments, there were two models that give robust predictions that are within the range of acceptable tuning of modulated tones for almost all the cases. Both models point to the existence of an underlying "stability sensitive" mechanism for the computation of pitch that gives more weight to the portion of the stimuli where the frequency is changing more slowly.
Collapse
Affiliation(s)
- Pablo E Etchemendy
- Laboratorio de Acústica y Percepción Sonora, Universidad Nacional de Quilmes, R. S. Pena 352 Bernal, B1876BXD Buenos Aires, Argentina
| | - Manuel C Eguia
- Laboratorio de Acústica y Percepción Sonora, Universidad Nacional de Quilmes, R. S. Pena 352 Bernal, B1876BXD Buenos Aires, Argentina
| | - Bruno Mesz
- Laboratorio de Acústica y Percepción Sonora, Universidad Nacional de Quilmes, R. S. Pena 352 Bernal, B1876BXD Buenos Aires, Argentina
| |
Collapse
|
33
|
Piovesan D, Pierobon A, DiZio P, Lackner JR. Experimental measure of arm stiffness during single reaching movements with a time-frequency analysis. J Neurophysiol 2013; 110:2484-96. [PMID: 23945781 DOI: 10.1152/jn.01013.2012] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
We tested an innovative method to estimate joint stiffness and damping during multijoint unfettered arm movements. The technique employs impulsive perturbations and a time-frequency analysis to estimate the arm's mechanical properties along a reaching trajectory. Each single impulsive perturbation provides a continuous estimation on a single-reach basis, making our method ideal to investigate motor adaptation in the presence of force fields and to study the control of movement in impaired individuals with limited kinematic repeatability. In contrast with previous dynamic stiffness studies, we found that stiffness varies during movement, achieving levels higher than during static postural control. High stiffness was associated with elevated reflexive activity. We observed a decrease in stiffness and a marked reduction in long-latency reflexes around the reaching movement velocity peak. This pattern could partly explain the difference between the high stiffness reported in postural studies and the low stiffness measured in dynamic estimation studies, where perturbations are typically applied near the peak velocity point.
Collapse
Affiliation(s)
- Davide Piovesan
- Sensory Motor Performance Program (SMPP), Rehabilitation Institute of Chicago, Chicago, Illinois
| | | | | | | |
Collapse
|
34
|
Dura-Bernal S, Garreau G, Georgiou J, Andreou AG, Denham SL, Wennekers T. Multimodal integration of micro-Doppler sonar and auditory signals for behavior classification with convolutional networks. Int J Neural Syst 2013; 23:1350021. [PMID: 23924412 DOI: 10.1142/s0129065713500214] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The ability to recognize the behavior of individuals is of great interest in the general field of safety (e.g. building security, crowd control, transport analysis, independent living for the elderly). Here we report a new real-time acoustic system for human action and behavior recognition that integrates passive audio and active micro-Doppler sonar signatures over multiple time scales. The system architecture is based on a six-layer convolutional neural network, trained and evaluated using a dataset of 10 subjects performing seven different behaviors. Probabilistic combination of system output through time for each modality separately yields 94% (passive audio) and 91% (micro-Doppler sonar) correct behavior classification; probabilistic multimodal integration increases classification performance to 98%. This study supports the efficacy of micro-Doppler sonar systems in characterizing human actions, which can then be efficiently classified using ConvNets. It also demonstrates that the integration of multiple sources of acoustic information can significantly improve the system's performance.
Collapse
Affiliation(s)
- Salvador Dura-Bernal
- Department of Physiology and Pharmacology, SUNY Downstate, 450 Clarkson Avenue, Brooklyn, NY 11203, USA.
| | | | | | | | | | | |
Collapse
|
35
|
Oppenheim JN, Isakov P, Magnasco MO. Degraded time-frequency acuity to time-reversed notes. PLoS One 2013; 8:e65386. [PMID: 23799012 PMCID: PMC3684602 DOI: 10.1371/journal.pone.0065386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2013] [Accepted: 04/26/2013] [Indexed: 12/02/2022] Open
Abstract
Time-reversal symmetry breaking is a key feature of many classes of natural sounds, originating in the physics of sound production. While attention has been paid to the response of the auditory system to “natural stimuli,” very few psychophysical tests have been performed. We conduct psychophysical measurements of time-frequency acuity for stylized representations of “natural”-like notes (sharp attack, long decay) and the time-reversed versions of these notes (long attack, sharp decay). Our results demonstrate significantly greater precision, arising from enhanced temporal acuity, for such sounds over their time-reversed versions, without a corresponding decrease in frequency acuity. These data inveigh against models of auditory processing that include tradeoffs between temporal and frequency acuity, at least in the range of notes tested and suggest the existence of statistical priors for notes with a sharp-attack and a long-decay. We are additionally able to calculate a minimal theoretical bound on the sophistication of the nonlinearities in auditory processing. We find that among the best studied classes of nonlinear time-frequency representations, only matching pursuit, spectral derivatives, and reassigned spectrograms are able to satisfy this criterion.
Collapse
Affiliation(s)
- Jacob N. Oppenheim
- Laboratory of Mathematical Physics, Rockefeller University, New York, New York, United States of America
| | - Pavel Isakov
- Laboratory of Mathematical Physics, Rockefeller University, New York, New York, United States of America
| | - Marcelo O. Magnasco
- Laboratory of Mathematical Physics, Rockefeller University, New York, New York, United States of America
- * E-mail:
| |
Collapse
|
36
|
Oppenheim JN, Magnasco MO. Human time-frequency acuity beats the Fourier uncertainty principle. PHYSICAL REVIEW LETTERS 2013; 110:044301. [PMID: 25166166 DOI: 10.1103/physrevlett.110.044301] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2012] [Revised: 11/13/2012] [Indexed: 06/03/2023]
Abstract
The time-frequency uncertainty principle states that the product of the temporal and frequency extents of a signal cannot be smaller than 1/(4 π). We study human ability to simultaneously judge the frequency and the timing of a sound. Our subjects often exceeded the uncertainty limit, sometimes by more than tenfold, mostly through remarkable timing acuity. Our results establish a lower bound for the nonlinearity and complexity of the algorithms employed by our brains in parsing transient sounds, rule out simple "linear filter" models of early auditory processing, and highlight timing acuity as a central feature in auditory object processing.
Collapse
Affiliation(s)
- Jacob N Oppenheim
- Laboratory of Mathematical Physics, Rockefeller University, New York, New York 10065, USA
| | - Marcelo O Magnasco
- Laboratory of Mathematical Physics, Rockefeller University, New York, New York 10065, USA
| |
Collapse
|
37
|
Piovesan D, Morasso P, Giannoni P, Casadio M. Arm stiffness during assisted movement after stroke: the influence of visual feedback and training. IEEE Trans Neural Syst Rehabil Eng 2012. [PMID: 23193322 DOI: 10.1109/tnsre.2012.2226915] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Spasticity and muscular hypertonus are frequently found in stroke survivors and may have a significant effect on functional impairment. These abnormal neuro-muscular properties, which are quantifiable by the net impedance of the hand, have a direct consequence on arm mechanics and are likely to produce anomalous motor paths. Literature studies quantifying limb impedance in stroke survivors have focused on multijoint static tasks and single joint movements. Despite this research, little is known about the role of sensory motor integration in post-stroke impedance modulation. The present study elucidates this role by integrating an evaluation of arm impedance into a robotically mediated therapy protocol. Our analysis had three specific objectives: 1) obtaining a reliable measure for the mechanical proprieties of the upper limb during robotic therapy; 2) investigating the effects of robot-assisted training and visual feedback on arm stiffness and viscosity; 3) determining if the stiffness measure and its relationship with either training or visual feedback depend on arm position, speed, and level of assistance. This work demonstrates that the performance improvements produced by minimally assistive robot training are associated with decreased viscosity and stiffness in stroke survivors' paretic arm and that these mechanical impedance components are partially modulated by visual feedback.
Collapse
Affiliation(s)
- Davide Piovesan
- Department of Physical Medicine and Rehabilitation, Northwestern University, Chicago, IL 60611, USA.
| | | | | | | |
Collapse
|
38
|
Ingle AN, Sethares WA. The least-squares invertible constant-Q spectrogram and its application to phase vocoding. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 132:894-903. [PMID: 22894212 DOI: 10.1121/1.4731466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
This paper discusses the development of a constant-Q spectrogram representation that is invertible in a least-squares sense. A good quality inverse is possible because this modified transform method, unlike the usual sliding window constant-Q spectrogram, does not discard data samples when performing the variable length discrete Fourier transforms on the signal. The development of a phase vocoder application using this modified technique is also discussed. It is shown that a phase vocoder constructed using the least-squares invertible constant-Q spectrogram (LSICQS) is not a trivial extension of the regular FFT-based phase vocoder algorithm and some of the mathematical subtleties related to phase reassignment are addressed.
Collapse
Affiliation(s)
- A N Ingle
- Department of Electrical and Computer Engineering, University of Wisconsin-Madison, 1415 Engineering Drive, Madison, Wisconsin 53706, USA.
| | | |
Collapse
|
39
|
Piovesan D, Pierobon A, DiZio P, Lackner JR. Measuring multi-joint stiffness during single movements: numerical validation of a novel time-frequency approach. PLoS One 2012; 7:e33086. [PMID: 22448233 PMCID: PMC3309009 DOI: 10.1371/journal.pone.0033086] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2011] [Accepted: 02/09/2012] [Indexed: 11/19/2022] Open
Abstract
This study presents and validates a Time-Frequency technique for measuring 2-dimensional multijoint arm stiffness throughout a single planar movement as well as during static posture. It is proposed as an alternative to current regressive methods which require numerous repetitions to obtain average stiffness on a small segment of the hand trajectory. The method is based on the analysis of the reassigned spectrogram of the arm's response to impulsive perturbations and can estimate arm stiffness on a trial-by-trial basis. Analytic and empirical methods are first derived and tested through modal analysis on synthetic data. The technique's accuracy and robustness are assessed by modeling the estimation of stiffness time profiles changing at different rates and affected by different noise levels. Our method obtains results comparable with two well-known regressive techniques. We also test how the technique can identify the viscoelastic component of non-linear and higher than second order systems with a non-parametrical approach. The technique proposed here is very impervious to noise and can be used easily for both postural and movement tasks. Estimations of stiffness profiles are possible with only one perturbation, making our method a useful tool for estimating limb stiffness during motor learning and adaptation tasks, and for understanding the modulation of stiffness in individuals with neurodegenerative diseases.
Collapse
Affiliation(s)
- Davide Piovesan
- Sensory Motor Performance Program, Rehabilitation Institute of Chicago, Chicago, Illinois, United States of America.
| | | | | | | |
Collapse
|
40
|
Jin F, Krishnan SS, Sattar F. Adventitious sounds identification and extraction using temporal-spectral dominance-based features. IEEE Trans Biomed Eng 2011; 58:3078-87. [PMID: 21712152 DOI: 10.1109/tbme.2011.2160721] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Respiratory sound (RS) signals carry significant information about the underlying functioning of the pulmonary system by the presence of adventitious sounds (ASs). Although many studies have addressed the problem of pathological RS classification, only a limited number of scientific works have focused on the analysis of the evolution of symptom-related signal components in joint time-frequency (TF) plane. This paper proposes a new signal identification and extraction method for various ASs based on instantaneous frequency (IF) analysis. The presented TF decomposition method produces a noise-resistant high definition TF representation of RS signals as compared to the conventional linear TF analysis methods, yet preserving the low computational complexity as compared to those quadratic TF analysis methods. The discarded phase information in conventional spectrogram has been adopted for the estimation of IF and group delay, and a temporal-spectral dominance spectrogram has subsequently been constructed by investigating the TF spreads of the computed time-corrected IF components. The proposed dominance measure enables the extraction of signal components correspond to ASs from noisy RS signal at high noise level. A new set of TF features has also been proposed to quantify the shapes of the obtained TF contours, and therefore strongly, enhances the identification of multicomponents signals such as polyphonic wheezes. An overall accuracy of 92.4±2.9% for the classification of real RS recordings shows the promising performance of the presented method.
Collapse
Affiliation(s)
- Feng Jin
- Department of Electrical and Computer Engineering, Ryerson University, Toronto, ON, Canada.
| | | | | |
Collapse
|
41
|
Murugappan S, Boyce S, Khosla S, Kelchner L, Gutmark E. Acoustic characteristics of phonation in "wet voice" conditions. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 127:2578-89. [PMID: 20370039 PMCID: PMC2865707 DOI: 10.1121/1.3308478] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2009] [Revised: 01/11/2010] [Accepted: 01/13/2010] [Indexed: 05/29/2023]
Abstract
A perceptible change in phonation characteristics after a swallow has long been considered evidence that food and/or drink material has entered the laryngeal vestibule and is on the surface of the vocal folds as they vibrate. The current paper investigates the acoustic characteristics of phonation when liquid material is present on the vocal folds, using ex vivo porcine larynges as a model. Consistent with instrumental examinations of swallowing disorders or dysphagia in humans, three liquids of different Varibar viscosity ("thin liquid," "nectar," and "honey") were studied at constant volume. The presence of materials on the folds during phonation was generally found to suppress the higher frequency harmonics and generate intermittent additional frequencies in the low and high end of the acoustic spectrum. Perturbation measures showed a higher percentage of jitter and shimmer when liquid material was present on the folds during phonation, but they were unable to differentiate statistically between the three fluid conditions. The finite correlation dimension and positive Lyapunov exponent measures indicated that the presence of materials on the vocal folds excited a chaotic system. Further, these measures were able to reliably differentiate between the baseline and different types of liquid on the vocal folds.
Collapse
Affiliation(s)
- Shanmugam Murugappan
- Department of Otolaryngology, Head and Neck Surgery, University of Cincinnati Medical Center, 231 Albert B. Sabin Way, Cincinnati, OH 45267-0528, USA.
| | | | | | | | | |
Collapse
|
42
|
Fulop SA. Accuracy of formant measurement for synthesized vowels using the reassigned spectrogram and comparison with linear prediction. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 127:2114-7. [PMID: 20369992 DOI: 10.1121/1.3308476] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
This brief report describes a small study which was undertaken with nine synthetic vowel tokens, in an effort to demonstrate the validity of the reassigned spectrogram as a formant measurement tool. The reassigned spectrogram's performance is also compared with that of a typical pitch-asynchronous linear predictive analysis and is found to be superior. In this study, reassigned spectrograms were further processed to highlight the formants and then were used to measure these synthetic vowel formants generally to within 0.5% of their known true values, far surpassing the accuracy of a typical linear predictive analysis procedure which was inaccurate by as much as 17%. The overall accuracy of reassigned spectrographic formant measurement is thus demonstrated in these cases.
Collapse
Affiliation(s)
- Sean A Fulop
- Department of Linguistics, California State University, Fresno, 5245 N. Backer Avenue, Fresno, CA 93740-8001, USA.
| |
Collapse
|
43
|
Ito M, Yano M. Sinusoidal modeling for nonstationary voiced speech based on a local vector transform. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2007; 121:1717-27. [PMID: 17407908 DOI: 10.1121/1.2431581] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
A voiced speech signal can be expressed as a sum of sinusoidal components of which instantaneous frequency and amplitude continuously vary with time. Determining these parameters from the input, the time-varying characteristics are crucial error sources for the algorithms, which assume their stationarity within a local analysis segment. To overcome this problem, a new method is proposed, local vector transform (LVT), which can determine instantaneous frequency and amplitude for nonstationary sinusoids. The method does not assume the local stationarity. The effectiveness of LVT was examined in parameter determination for synthesized and naturally uttered speech signals. The instantaneous frequency for the first harmonic component was determined with an accuracy almost equal to that of the time-corrected instantaneous frequency method and higher accuracy than that of spectral peak-picking, autocorrelation, and cepstrum. The instantaneous amplitude was also determined accurately by LVT while considerable errors were left in the other algorithms. The signal reconstructed from the determined parameters by LVT agreed well with the corresponding component of voiced speech. These results suggest that the method is effective for analyzing time-varying voiced speech signals.
Collapse
Affiliation(s)
- Masashi Ito
- Research Institute of Electrical Communication, Tohoku University, 2-1-1 Katahira, Aoba-ku, Sendai 980-8577, Japan
| | | |
Collapse
|
44
|
Fulop SA, Fitz K. Separation of components from impulses in reassigned spectrograms. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2007; 121:1510-8. [PMID: 17407888 DOI: 10.1121/1.2431329] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
Two computational methods for pruning a reassigned spectrogram to show only quasisinusoidal components, or only impulses, or both, are presented mathematically and provided with step-by-step algorithms. Both methods compute the second-order mixed partial derivative of the short-time Fourier transform phase, and rely on the conditions that components and impulses are each well-represented by reassigned spectrographic points possessing particular values of this derivative. This use of the mixed second-order derivative was introduced by Nelson [J. Acoust. Soc. Am. 110, 2575-2592 (2001)] but here our goals are to completely describe the computation of this derivative in a way that highlights the relations to the two most influential methods of computing a reassigned spectrogram, and also to demonstrate the utility of this technique for plotting spectrograms showing line components or impulses while excluding most other points. When applied to speech signals, vocal tract resonances (formants) or glottal pulsations can be effectively isolated in expanded views of the phonation process.
Collapse
Affiliation(s)
- Sean A Fulop
- Department of Linguistics, California State University, Fresno, California 93740-8001, USA.
| | | |
Collapse
|