1
|
Zhang Y, Jiang W, Sun L, Wang J, Zheng X, Xue Q. A Deep Learning-Based Generalized Empirical Flow Model of Glottal Flow During Normal Phonation. J Biomech Eng 2022; 144:091001. [PMID: 35171218 PMCID: PMC8990722 DOI: 10.1115/1.4053862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 02/10/2022] [Indexed: 11/08/2022]
Abstract
This paper proposes a deep learning-based generalized empirical flow model (EFM) that can provide a fast and accurate prediction of the glottal flow during normal phonation. The approach is based on the assumption that the vibration of the vocal folds can be represented by a universal kinematics equation (UKE), which is used to generate a glottal shape library. For each shape in the library, the ground truth values of the flow rate and pressure distribution are obtained from the high-fidelity Navier-Stokes (N-S) solution. A fully connected deep neural network (DNN) is then trained to build the empirical mapping between the shapes and the flow rate and pressure distributions. The obtained DNN-based EFM is coupled with a finite element method (FEM)-based solid dynamics solver for fluid-structure-interaction (FSI) simulation of phonation. The EFM is evaluated by comparing the N-S solutions in both static glottal shapes and FSI simulations. The results demonstrate a good prediction performance in accuracy and efficiency.
Collapse
Affiliation(s)
- Yang Zhang
- Department of Mechanical Engineering, University of Maine, Orono, ME 04469
| | - Weili Jiang
- Department of Mechanical Engineering, University of Maine, 204 Crosby Hall, Orono, ME 04473
| | - Luning Sun
- Department of Aerospace and Mechanical Engineering, University of Notre Dame, Notre Dame, IN 46556
| | - Jianxun Wang
- Department of Aerospace and Mechanical Engineering, University of Notre Dame, Notre Dame, IN 46556
| | - Xudong Zheng
- Department of Mechanical Engineering, University of Maine, Room 213 A, Boardman Hall, Orono, ME 04473
| | - Qian Xue
- Department of Mechanical Engineering, University of Maine, Room 213, Boardman Hall, Orono, ME 04473
| |
Collapse
|
2
|
Weerathunge HR, Alzamendi GA, Cler GJ, Guenther FH, Stepp CE, Zañartu M. LaDIVA: A neurocomputational model providing laryngeal motor control for speech acquisition and production. PLoS Comput Biol 2022; 18:e1010159. [PMID: 35737706 PMCID: PMC9258861 DOI: 10.1371/journal.pcbi.1010159] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 07/06/2022] [Accepted: 05/02/2022] [Indexed: 11/18/2022] Open
Abstract
Many voice disorders are the result of intricate neural and/or biomechanical impairments that are poorly understood. The limited knowledge of their etiological and pathophysiological mechanisms hampers effective clinical management. Behavioral studies have been used concurrently with computational models to better understand typical and pathological laryngeal motor control. Thus far, however, a unified computational framework that quantitatively integrates physiologically relevant models of phonation with the neural control of speech has not been developed. Here, we introduce LaDIVA, a novel neurocomputational model with physiologically based laryngeal motor control. We combined the DIVA model (an established neural network model of speech motor control) with the extended body-cover model (a physics-based vocal fold model). The resulting integrated model, LaDIVA, was validated by comparing its model simulations with behavioral responses to perturbations of auditory vocal fundamental frequency (fo) feedback in adults with typical speech. LaDIVA demonstrated capability to simulate different modes of laryngeal motor control, ranging from short-term (i.e., reflexive) and long-term (i.e., adaptive) auditory feedback paradigms, to generating prosodic contours in speech. Simulations showed that LaDIVA’s laryngeal motor control displays properties of motor equivalence, i.e., LaDIVA could robustly generate compensatory responses to reflexive vocal fo perturbations with varying initial laryngeal muscle activation levels leading to the same output. The model can also generate prosodic contours for studying laryngeal motor control in running speech. LaDIVA can expand the understanding of the physiology of human phonation to enable, for the first time, the investigation of causal effects of neural motor control in the fine structure of the vocal signal.
Collapse
Affiliation(s)
- Hasini R. Weerathunge
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, Massachusetts, United States of America
- * E-mail:
| | - Gabriel A. Alzamendi
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Valparaíso, Chile
- Institute for Research and Development on Bioengineering and Bioinformatics (IBB), CONICET-UNER, Oro Verde, Argentina
| | - Gabriel J. Cler
- Department of Speech & Hearing Sciences, University of Washington, Seattle, Washington, United States of America
| | - Frank H. Guenther
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, Massachusetts, United States of America
| | - Cara E. Stepp
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, United States of America
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, Massachusetts, United States of America
- Department of Otolaryngology-Head and Neck Surgery, Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Matías Zañartu
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Valparaíso, Chile
| |
Collapse
|
3
|
Zhang Y, Pu T, Zhou C, Cai H. An Improved Glottal Flow Model Based on Seq2Seq LSTM for Simulation of Vocal Fold Vibration. J Voice 2022:S0892-1997(22)00102-3. [PMID: 35534328 DOI: 10.1016/j.jvoice.2022.03.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 03/29/2022] [Accepted: 03/30/2022] [Indexed: 10/18/2022]
Abstract
OBJECTIVES An improved data-driven glottal flow model for fluid-structure interaction (FSI) simulation of the vocal fold vibration is proposed in this paper. This model aims to improve the prediction performance of the previously developed deep neural network (DNN) based empirical flow model (EFM)1 on accuracy and efficiency. METHODS A Seq2Seq long short-term memory (LSTM) network is employed in the present model to infer the flow rate and pressure distribution from the subglottal pressure and cross-section area distribution of the glottis. The training data is collected from the generalized glottal shape library generated in Zhang et al.1 RESULTS AND CONCLUSIONS: Compared to the EFM, the present model not only discards the time-consuming optimization process, but also drastically reduces the errors, therefore the prediction performance can be greatly improved. The present model is evaluated by coupling with a solid dynamics solver for FSI simulation, and the results demonstrate a great improvement on accuracy and efficiency.
Collapse
Affiliation(s)
- Yang Zhang
- College of Astronautics, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China.
| | - Tianmei Pu
- College of General Aviation and Flight, Nanjing University of Aeronautics and Astronautics, Nanjing 213300, China
| | - Chunhua Zhou
- Department of Aerodynamics, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
| | - Hongming Cai
- College of Astronautics, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
| |
Collapse
|
4
|
Horáček J, Radolf V, Bula V, Laukkanen AM. Experimental modelling and human data of glottal area declination rate for vowel and semi-occluded vocal tract phonation. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2021.102432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
5
|
Electroglottography – An Update. J Voice 2020; 34:503-526. [DOI: 10.1016/j.jvoice.2018.12.014] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2018] [Revised: 12/27/2018] [Accepted: 12/28/2018] [Indexed: 11/21/2022]
|
6
|
Zhang Y, Zheng X, Xue Q. A Deep Neural Network Based Glottal Flow Model for Predicting Fluid-Structure Interactions during Voice Production. APPLIED SCIENCES-BASEL 2020; 10. [PMID: 34306737 PMCID: PMC8299989 DOI: 10.3390/app10020705] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
This paper proposes a machine-learning based reduced-order model that can provide fast and accurate prediction of the glottal flow during voice production. The model is based on the Bernoulli equation with a viscous loss term predicted by a deep neural network (DNN) model. The training data of the DNN model is a Navier-Stokes (N-S) equation-based three-dimensional simulation of glottal flows in various glottal shapes generated by a synthetic shape function, which can be obtained by superimposing the instantaneous modal displacements during vibration on the prephonatory geometry of the glottal shape. The input parameters of the DNN model are the geometric and flow parameters extracted from discretized cross sections of the glottal shapes and the output target is the corresponding flow resistance coefficient. With this trained DNN-Bernoulli model, the flow resistance coefficient as well as the flow rate and pressure distribution in any given glottal shape generated by the synthetic shape function can be predicted. The model is further coupled with a finite-element method based solid dynamics solver for simulating fluid-structure interactions (FSI). The prediction performance of the model for both static shape and FSI simulations is evaluated by comparing the solutions to those obtained by the Bernoulli and N-S model. The model shows a good prediction performance in accuracy and efficiency, suggesting a promise for future clinical use.
Collapse
|
7
|
Zhang Y, Zheng X, Xue Q. A Deep Neural Network Based Glottal Flow Model for Predicting Fluid-Structure Interactions during Voice Production. APPLIED SCIENCES (BASEL, SWITZERLAND) 2020; 10:705. [PMID: 34306737 DOI: 10.3390/app10113794] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
This paper proposes a machine-learning based reduced-order model that can provide fast and accurate prediction of the glottal flow during voice production. The model is based on the Bernoulli equation with a viscous loss term predicted by a deep neural network (DNN) model. The training data of the DNN model is a Navier-Stokes (N-S) equation-based three-dimensional simulation of glottal flows in various glottal shapes generated by a synthetic shape function, which can be obtained by superimposing the instantaneous modal displacements during vibration on the prephonatory geometry of the glottal shape. The input parameters of the DNN model are the geometric and flow parameters extracted from discretized cross sections of the glottal shapes and the output target is the corresponding flow resistance coefficient. With this trained DNN-Bernoulli model, the flow resistance coefficient as well as the flow rate and pressure distribution in any given glottal shape generated by the synthetic shape function can be predicted. The model is further coupled with a finite-element method based solid dynamics solver for simulating fluid-structure interactions (FSI). The prediction performance of the model for both static shape and FSI simulations is evaluated by comparing the solutions to those obtained by the Bernoulli and N-S model. The model shows a good prediction performance in accuracy and efficiency, suggesting a promise for future clinical use.
Collapse
Affiliation(s)
- Yang Zhang
- Department of Mechanical Engineering, University of Maine, Orono, ME 04469, USA
| | - Xudong Zheng
- Department of Mechanical Engineering, University of Maine, Orono, ME 04469, USA
| | - Qian Xue
- Department of Mechanical Engineering, University of Maine, Orono, ME 04469, USA
| |
Collapse
|
8
|
Zhang X, Wang Y, Zhao W, Wei W, Tao Z, Zhao H. Vocal cord abnormal voice flow field study by modeling a bionic vocal system. Adv Robot 2019. [DOI: 10.1080/01691864.2019.1705907] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Xiaojun Zhang
- School of Electronic and Information Engineering, Soochow University, Suzhou, People’s Republic of China
- School of Optoelectronic Science and Engineering, Soochow University, Suzhou, People’s Republic of China
| | - Yan Wang
- School of Optoelectronic Science and Engineering, Soochow University, Suzhou, People’s Republic of China
| | - Wei Zhao
- School of Electronic and Information Engineering, Soochow University, Suzhou, People’s Republic of China
| | - Wei Wei
- School of Optoelectronic Science and Engineering, Soochow University, Suzhou, People’s Republic of China
| | - Zhi Tao
- School of Electronic and Information Engineering, Soochow University, Suzhou, People’s Republic of China
- School of Optoelectronic Science and Engineering, Soochow University, Suzhou, People’s Republic of China
| | - Heming Zhao
- School of Electronic and Information Engineering, Soochow University, Suzhou, People’s Republic of China
| |
Collapse
|