1
|
Wu EQ, Tang Z, Yao Y, Qiu XY, Deng PY, Xiong P, Song A, Zhu LM, Zhou M. Scalable Gamma-Driven Multilayer Network for Brain Workload Detection Through Functional Near-Infrared Spectroscopy. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:12464-12478. [PMID: 34705661 DOI: 10.1109/tcyb.2021.3116964] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This work proposes a scalable gamma non-negative matrix network (SGNMN), which uses a Poisson randomized Gamma factor analysis to obtain the neurons of the first layer of a network. These neurons obey Gamma distribution whose shape parameter infers the neurons of the next layer of the network and their related weights. Upsampling the connection weights follows a Dirichlet distribution. Downsampling hidden units obey Gamma distribution. This work performs up-down sampling on each layer to learn the parameters of SGNMN. Experimental results indicate that the width and depth of SGNMN are closely related, and a reasonable network structure for accurately detecting brain fatigue through functional near-infrared spectroscopy can be obtained by considering network width, depth, and parameters.
Collapse
|
2
|
Kröger BJ. Computer-Implemented Articulatory Models for Speech Production: A Review. Front Robot AI 2022; 9:796739. [PMID: 35494539 PMCID: PMC9040071 DOI: 10.3389/frobt.2022.796739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Accepted: 02/21/2022] [Indexed: 11/24/2022] Open
Abstract
Modeling speech production and speech articulation is still an evolving research topic. Some current core questions are: What is the underlying (neural) organization for controlling speech articulation? How to model speech articulators like lips and tongue and their movements in an efficient but also biologically realistic way? How to develop high-quality articulatory-acoustic models leading to high-quality articulatory speech synthesis? Thus, on the one hand computer-modeling will help us to unfold underlying biological as well as acoustic-articulatory concepts of speech production and on the other hand further modeling efforts will help us to reach the goal of high-quality articulatory-acoustic speech synthesis based on more detailed knowledge on vocal tract acoustics and speech articulation. Currently, articulatory models are not able to reach the quality level of corpus-based speech synthesis. Moreover, biomechanical and neuromuscular based approaches are complex and still not usable for sentence-level speech synthesis. This paper lists many computer-implemented articulatory models and provides criteria for dividing articulatory models in different categories. A recent major research question, i.e., how to control articulatory models in a neurobiologically adequate manner is discussed in detail. It can be concluded that there is a strong need to further developing articulatory-acoustic models in order to test quantitative neurobiologically based control concepts for speech articulation as well as to uncover the remaining details in human articulatory and acoustic signal generation. Furthermore, these efforts may help us to approach the goal of establishing high-quality articulatory-acoustic as well as neurobiologically grounded speech synthesis.
Collapse
|
3
|
Woo J, Xing F, Prince JL, Stone M, Gomez AD, Reese TG, Wedeen VJ, El Fakhri G. A deep joint sparse non-negative matrix factorization framework for identifying the common and subject-specific functional units of tongue motion during speech. Med Image Anal 2021; 72:102131. [PMID: 34174748 PMCID: PMC8316408 DOI: 10.1016/j.media.2021.102131] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Revised: 05/23/2021] [Accepted: 06/01/2021] [Indexed: 11/22/2022]
Abstract
Intelligible speech is produced by creating varying internal local muscle groupings-i.e., functional units-that are generated in a systematic and coordinated manner. There are two major challenges in characterizing and analyzing functional units. First, due to the complex and convoluted nature of tongue structure and function, it is of great importance to develop a method that can accurately decode complex muscle coordination patterns during speech. Second, it is challenging to keep identified functional units across subjects comparable due to their substantial variability. In this work, to address these challenges, we develop a new deep learning framework to identify common and subject-specific functional units of tongue motion during speech. Our framework hinges on joint deep graph-regularized sparse non-negative matrix factorization (NMF) using motion quantities derived from displacements by tagged Magnetic Resonance Imaging. More specifically, we transform NMF with sparse and graph regularizations into modular architectures akin to deep neural networks by means of unfolding the Iterative Shrinkage-Thresholding Algorithm to learn interpretable building blocks and associated weighting map. We then apply spectral clustering to common and subject-specific weighting maps from which we jointly determine the common and subject-specific functional units. Experiments carried out with simulated datasets show that the proposed method achieved on par or better clustering performance over the comparison methods.Experiments carried out with in vivo tongue motion data show that the proposed method can determine the common and subject-specific functional units with increased interpretability and decreased size variability.
Collapse
Affiliation(s)
- Jonghye Woo
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA.
| | - Fangxu Xing
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Jerry L Prince
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Maureen Stone
- Department of Neural and Pain Sciences, University of Maryland School of Dentistry, Baltimore, MD 21201, USA
| | - Arnold D Gomez
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD 21218, USA
| | - Timothy G Reese
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02129, USA
| | - Van J Wedeen
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02129, USA
| | - Georges El Fakhri
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| |
Collapse
|
4
|
Gick B, Mayer C, Chiu C, Widing E, Roewer-Després F, Fels S, Stavness I. Quantal biomechanical effects in speech postures of the lips. J Neurophysiol 2020; 124:833-843. [PMID: 32727259 DOI: 10.1152/jn.00676.2019] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The unique biomechanical and functional constraints on human speech make it a promising area for research investigating modular control of movement. The present article illustrates how a modular control approach to speech can provide insights relevant to understanding both motor control and observed variation across languages. We specifically explore the robust typological finding that languages produce different degrees of labial constriction using distinct muscle groupings and concomitantly distinct lip postures. Research has suggested that these lip postures exploit biomechanical regions of nonlinearity between neural activation and movement, also known as quantal regions, to allow movement goals to be realized despite variable activation signals. We present two sets of computer simulations showing that these labial postures can be generated under the assumption of modular control and that the corresponding modules are biomechanically robust: first to variation in the activation levels of participating muscles, and second to interference from surrounding muscles. These results provide support for the hypothesis that biomechanical robustness is an important factor in selecting the muscle groupings used for speech movements and provide insight into the neurological control of speech movements and how biomechanical and functional constraints govern the emergence of speech motor modules. We anticipate that future experimental work guided by biomechanical simulation results will provide new insights into the neural organization of speech movements.NEW & NOTEWORTHY This article provides additional evidence that speech motor control is organized in a modular fashion and that biomechanics constrain the kinds of motor modules that may emerge. It also suggests that speech can be a fruitful domain for the study of modularity and that a better understanding of speech motor modules will be useful for speech research. Finally, it suggests that biomechanical modeling can serve as a useful complement to experimental work when studying modularity.
Collapse
Affiliation(s)
- Bryan Gick
- Department of Linguistics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Connor Mayer
- Department of Linguistics, University of California, Los Angeles, Los Angeles, California
| | - Chenhao Chiu
- Graduate Institute of Linguistics, National Taiwan University, Taipei, Taiwan
| | - Erik Widing
- Department of Computer Science, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | | | - Sidney Fels
- Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Ian Stavness
- Department of Computer Science, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| |
Collapse
|
5
|
Shao Y, Hayward V, Visell Y. Compression of dynamic tactile information in the human hand. SCIENCE ADVANCES 2020; 6:eaaz1158. [PMID: 32494610 PMCID: PMC7159916 DOI: 10.1126/sciadv.aaz1158] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Accepted: 01/17/2020] [Indexed: 05/16/2023]
Abstract
A key problem in the study of the senses is to describe how sense organs extract perceptual information from the physics of the environment. We previously observed that dynamic touch elicits mechanical waves that propagate throughout the hand. Here, we show that these waves produce an efficient encoding of tactile information. The computation of an optimal encoding of thousands of naturally occurring tactile stimuli yielded a compact lexicon of primitive wave patterns that sparsely represented the entire dataset, enabling touch interactions to be classified with an accuracy exceeding 95%. The primitive tactile patterns reflected the interplay of hand anatomy with wave physics. Notably, similar patterns emerged when we applied efficient encoding criteria to spiking data from populations of simulated tactile afferents. This finding suggests that the biomechanics of the hand enables efficient perceptual processing by effecting a preneuronal compression of tactile information.
Collapse
Affiliation(s)
- Yitian Shao
- Department of Electrical and Computer Engineering, Media Arts and Technology Program, Department of Mechanical Engineering, and California NanoSystems Institute, University of California, Santa Barbara, Santa Barbara, CA, USA
| | - Vincent Hayward
- Sorbonne Université, Institut des Systèmes Intelligents et de Robotique, F-75005 Paris, France
- Centre for the Study of the Senses, School of Advanced Study, University of London, London, UK
- Actronika SAS, Paris, France
| | - Yon Visell
- Department of Electrical and Computer Engineering, Media Arts and Technology Program, Department of Mechanical Engineering, and California NanoSystems Institute, University of California, Santa Barbara, Santa Barbara, CA, USA
- Corresponding author.
| |
Collapse
|
6
|
Woo J, Xing F, Prince JL, Stone M, Reese TG, Wedeen VJ, El Fakhri G. Identifying the Common and Subject-specific Functional Units of Speech Movements via a Joint Sparse Non-negative Matrix Factorization Framework. PROCEEDINGS OF SPIE--THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING 2020; 11313:113131S. [PMID: 32454553 PMCID: PMC7243345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The tongue is capable of producing intelligible speech because of successful orchestration of muscle groupings-i.e., functional units-of the highly complex muscles over time. Due to the different motions that tongues produce, functional units are transitional structures which transform muscle activity to surface tongue geometry and they vary significantly from one subject to another. In order to compare and contrast the location and size of functional units in the presence of such substantial inter-person variability, it is essential to study both common and subject-specific functional units in a group of people carrying out the same speech task. In this work, a new normalization technique is presented to simultaneously identify the common and subject-specific functional units defined in the tongue when tracked by tagged magnetic resonance imaging. To achieve our goal, a joint sparse non-negative matrix factorization framework is used, which learns a set of building blocks and subject-specific as well as common weighting matrices from motion quantities extracted from displacements. A spectral clustering technique is then applied to the subject-specific and common weighting matrices to determine the subject-specific functional units for each subject and the common functional units across subjects. Our experimental results using in vivo tongue motion data show that our approach is able to identify the common and subject-specific functional units with reduced size variability of tongue motion during speech.
Collapse
Affiliation(s)
- Jonghye Woo
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Fangxu Xing
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Jerry L. Prince
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Maureen Stone
- Department of Neural and Pain Sciences, University of Maryland School of Dentistry, Baltimore, MD 21201, USA
| | - Timothy G. Reese
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02129, USA
| | - Van J. Wedeen
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02129, USA
| | - Georges El Fakhri
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| |
Collapse
|
7
|
Parrell B, Ramanarayanan V, Nagarajan S, Houde J. The FACTS model of speech motor control: Fusing state estimation and task-based control. PLoS Comput Biol 2019; 15:e1007321. [PMID: 31479444 PMCID: PMC6743785 DOI: 10.1371/journal.pcbi.1007321] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Revised: 09/13/2019] [Accepted: 08/02/2019] [Indexed: 11/18/2022] Open
Abstract
We present a new computational model of speech motor control: the Feedback-Aware Control of Tasks in Speech or FACTS model. FACTS employs a hierarchical state feedback control architecture to control simulated vocal tract and produce intelligible speech. The model includes higher-level control of speech tasks and lower-level control of speech articulators. The task controller is modeled as a dynamical system governing the creation of desired constrictions in the vocal tract, after Task Dynamics. Both the task and articulatory controllers rely on an internal estimate of the current state of the vocal tract to generate motor commands. This estimate is derived, based on efference copy of applied controls, from a forward model that predicts both the next vocal tract state as well as expected auditory and somatosensory feedback. A comparison between predicted feedback and actual feedback is then used to update the internal state prediction. FACTS is able to qualitatively replicate many characteristics of the human speech system: the model is robust to noise in both the sensory and motor pathways, is relatively unaffected by a loss of auditory feedback but is more significantly impacted by the loss of somatosensory feedback, and responds appropriately to externally-imposed alterations of auditory and somatosensory feedback. The model also replicates previously hypothesized trade-offs between reliance on auditory and somatosensory feedback and shows for the first time how this relationship may be mediated by acuity in each sensory domain. These results have important implications for our understanding of the speech motor control system in humans.
Collapse
Affiliation(s)
- Benjamin Parrell
- Department of Communication Sciences and Disorders, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | - Vikram Ramanarayanan
- Department of Otolaryngology - Head and Neck Surgery, University of California, San Francisco, San Francisco, California, United States of America
- Educational Testing Service R&D, San Francisco, California, United States of America
| | - Srikantan Nagarajan
- Department of Otolaryngology - Head and Neck Surgery, University of California, San Francisco, San Francisco, California, United States of America
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, California, United States of America
| | - John Houde
- Department of Otolaryngology - Head and Neck Surgery, University of California, San Francisco, San Francisco, California, United States of America
| |
Collapse
|
8
|
Woo J, Xing F, Prince JL, Stone M, Green JR, Goldsmith T, Reese TG, Wedeen VJ, El Fakhri G. Differentiating post-cancer from healthy tongue muscle coordination patterns during speech using deep learning. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:EL423. [PMID: 31153323 PMCID: PMC6530633 DOI: 10.1121/1.5103191] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Revised: 04/18/2019] [Accepted: 04/22/2019] [Indexed: 06/09/2023]
Abstract
The ability to differentiate post-cancer from healthy tongue muscle coordination patterns is necessary for the advancement of speech motor control theories and for the development of therapeutic and rehabilitative strategies. A deep learning approach is presented to classify two groups using muscle coordination patterns from magnetic resonance imaging (MRI). The proposed method uses tagged-MRI to track the tongue's internal tissue points and atlas-driven non-negative matrix factorization to reduce the dimensionality of the deformation fields. A convolutional neural network is applied to the classification task yielding an accuracy of 96.90%, offering the potential to the development of therapeutic or rehabilitative strategies in speech-related disorders.
Collapse
Affiliation(s)
- Jonghye Woo
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts 02114, USA
| | - Fangxu Xing
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts 02114, USA
| | - Jerry L Prince
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Maureen Stone
- Department of Pain and Neural Sciences, University of Maryland Dental School, Baltimore, Maryland 21201, USA
| | - Jordan R Green
- Department of Communication Sciences and Disorders, MGH Institute of Health Professions, Boston, Massachusetts 02129, USA
| | - Tessa Goldsmith
- Department of Speech, Language and Swallowing Disorders, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| | - Timothy G Reese
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts 02114, , , , , , , , ,
| | - Van J Wedeen
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts 02114, , , , , , , , ,
| | - Georges El Fakhri
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts 02114, USA
| |
Collapse
|
9
|
Woo J, Prince JL, Stone M, Xing F, Gomez AD, Green JR, Hartnick CJ, Brady TJ, Reese TG, Wedeen VJ, El Fakhri G. A Sparse Non-Negative Matrix Factorization Framework for Identifying Functional Units of Tongue Behavior From MRI. IEEE TRANSACTIONS ON MEDICAL IMAGING 2019; 38:730-740. [PMID: 30235120 PMCID: PMC6422735 DOI: 10.1109/tmi.2018.2870939] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Muscle coordination patterns of lingual behaviors are synergies generated by deforming local muscle groups in a variety of ways. Functional units are functional muscle groups of local structural elements within the tongue that compress, expand, and move in a cohesive and consistent manner. Identifying the functional units using tagged-magnetic resonance imaging (MRI) sheds light on the mechanisms of normal and pathological muscle coordination patterns, yielding improvement in surgical planning, treatment, or rehabilitation procedures. In this paper, to mine this information, we propose a matrix factorization and probabilistic graphical model framework to produce building blocks and their associated weighting map using motion quantities extracted from tagged-MRI. Our tagged-MRI imaging and accurate voxel-level tracking provide previously unavailable internal tongue motion patterns, thus revealing the inner workings of the tongue during speech or other lingual behaviors. We then employ spectral clustering on the weighting map to identify the cohesive regions defined by the tongue motion that may involve multiple or undocumented regions. To evaluate our method, we perform a series of experiments. We first use two-dimensional images and synthetic data to demonstrate the accuracy of our method. We then use three-dimensional synthetic and in vivo tongue motion data using protrusion and simple speech tasks to identify subject-specific and data-driven functional units of the tongue in localized regions.
Collapse
Affiliation(s)
- Jonghye Woo
- Department of Radiology, Massachusetts General Hospital and Harvard Medical School
| | - Jerry L. Prince
- Department of Electrical and Computer Engineering at Johns Hopkins University
| | | | - Fangxu Xing
- Department of Radiology, Massachusetts General Hospital and Harvard Medical School
| | - Arnold D. Gomez
- Department of Electrical and Computer Engineering at Johns Hopkins University
| | | | | | - Thomas J. Brady
- Department of Radiology, Massachusetts General Hospital and Harvard Medical School
| | - Timothy G. Reese
- Department of Radiology, Massachusetts General Hospital and Harvard Medical School
| | - Van J. Wedeen
- Department of Radiology, Massachusetts General Hospital and Harvard Medical School
| | - Georges El Fakhri
- Department of Radiology, Massachusetts General Hospital and Harvard Medical School
| |
Collapse
|
10
|
Mackevicius EL, Bahle AH, Williams AH, Gu S, Denisenko NI, Goldman MS, Fee MS. Unsupervised discovery of temporal sequences in high-dimensional datasets, with applications to neuroscience. eLife 2019; 8:38471. [PMID: 30719973 PMCID: PMC6363393 DOI: 10.7554/elife.38471] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2018] [Accepted: 01/04/2019] [Indexed: 11/22/2022] Open
Abstract
Identifying low-dimensional features that describe large-scale neural recordings is a major challenge in neuroscience. Repeated temporal patterns (sequences) are thought to be a salient feature of neural dynamics, but are not succinctly captured by traditional dimensionality reduction techniques. Here, we describe a software toolbox—called seqNMF—with new methods for extracting informative, non-redundant, sequences from high-dimensional neural data, testing the significance of these extracted patterns, and assessing the prevalence of sequential structure in data. We test these methods on simulated data under multiple noise conditions, and on several real neural and behavioral data sets. In hippocampal data, seqNMF identifies neural sequences that match those calculated manually by reference to behavioral events. In songbird data, seqNMF discovers neural sequences in untutored birds that lack stereotyped songs. Thus, by identifying temporal structure directly from neural data, seqNMF enables dissection of complex neural circuits without relying on temporal references from stimuli or behavioral outputs.
Collapse
Affiliation(s)
- Emily L Mackevicius
- McGovern Institute for Brain Research, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, United States
| | - Andrew H Bahle
- McGovern Institute for Brain Research, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, United States
| | - Alex H Williams
- Neurosciences Program, Stanford University, Stanford, United States
| | - Shijie Gu
- McGovern Institute for Brain Research, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, United States.,School of Life Sciences and Technology, ShanghaiTech University, Shanghai, China
| | - Natalia I Denisenko
- McGovern Institute for Brain Research, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, United States
| | - Mark S Goldman
- Center for Neuroscience, Department of Neurobiology, Physiology and Behavior, University of California, Davis, Davis, United States.,Department of Ophthamology and Vision Science, University of California, Davis, Davis, United States
| | - Michale S Fee
- McGovern Institute for Brain Research, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, United States
| |
Collapse
|
11
|
Ramanarayanan V, Tilsen S, Proctor M, Töger J, Goldstein L, Nayak KS, Narayanan S. Analysis of speech production real-time MRI. COMPUT SPEECH LANG 2018. [DOI: 10.1016/j.csl.2018.04.002] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
12
|
Lee E, Xing F, Ahn S, Reese TG, Wang R, Green JR, Atassi N, Wedeen VJ, El Fakhri G, Woo J. Magnetic resonance imaging based anatomical assessment of tongue impairment due to amyotrophic lateral sclerosis: A preliminary study. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:EL248. [PMID: 29716267 PMCID: PMC5895467 DOI: 10.1121/1.5030134] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/25/2017] [Revised: 03/12/2018] [Accepted: 03/14/2018] [Indexed: 06/08/2023]
Abstract
Amyotrophic Lateral Sclerosis (ALS) is a neurological disorder, which impairs tongue function for speech and swallowing. A widely used Diffusion Tensor Imaging (DTI) analysis pipeline is employed for quantifying differences in tongue fiber myoarchitecture between controls and ALS patients. This pipeline uses both high-resolution magnetic resonance imaging (hMRI) and DTI. hMRI is used to delineate tongue muscles, while DTI provides indices to reveal fiber connectivity within and between muscles. The preliminary results using five controls and two patients show quantitative differences between the groups. This work has the potential to provide insights into the detrimental effects of ALS on speech and swallowing.
Collapse
Affiliation(s)
- Euna Lee
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts 02114, USA
| | - Fangxu Xing
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts 02114, USA
| | - Sung Ahn
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts 02114, USA
| | - Timothy G Reese
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts 02129, USA
| | - Ruopeng Wang
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts 02129, USA
| | - Jordan R Green
- Department of Communication Sciences and Disorders, MGH Institute of Health Professions, Boston, Massachusetts 02129, USA
| | - Nazem Atassi
- Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts 02114, USA , , , , , , , , ,
| | - Van J Wedeen
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts 02129, USA
| | - Georges El Fakhri
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts 02114, USA
| | - Jonghye Woo
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts 02114, USA
| |
Collapse
|
13
|
MUPET-Mouse Ultrasonic Profile ExTraction: A Signal Processing Tool for Rapid and Unsupervised Analysis of Ultrasonic Vocalizations. Neuron 2017; 94:465-485.e5. [PMID: 28472651 DOI: 10.1016/j.neuron.2017.04.005] [Citation(s) in RCA: 88] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Revised: 07/04/2016] [Accepted: 04/04/2017] [Indexed: 12/26/2022]
Abstract
Vocalizations play a significant role in social communication across species. Analyses in rodents have used a limited number of spectro-temporal measures to compare ultrasonic vocalizations (USVs), which limits the ability to address repertoire complexity in the context of behavioral states. Using an automated and unsupervised signal processing approach, we report the development of MUPET (Mouse Ultrasonic Profile ExTraction) software, an open-access MATLAB tool that provides data-driven, high-throughput analyses of USVs. MUPET measures, learns, and compares syllable types and provides an automated time stamp of syllable events. Using USV data from a large mouse genetic reference panel and open-source datasets produced in different social contexts, MUPET analyzes the fine details of syllable production and repertoire use. MUPET thus serves as a new tool for USV repertoire analyses, with the capability to be adapted for use with other species.
Collapse
|
14
|
Ramanarayanan V, Van Segbroeck M, Narayanan SS. Directly data-derived articulatory gesture-like representations retain discriminatory information about phone categories. COMPUT SPEECH LANG 2016; 36:330-346. [PMID: 26688612 DOI: 10.1016/j.csl.2015.03.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
How the speech production and perception systems evolved in humans still remains a mystery today. Previous research suggests that human auditory systems are able, and have possibly evolved, to preserve maximal information about the speaker's articulatory gestures. This paper attempts an initial step towards answering the complementary question of whether speakers' articulatory mechanisms have also evolved to produce sounds that can be optimally discriminated by the listener's auditory system. To this end we explicitly model, using computational methods, the extent to which derived representations of "primitive movements" of speech articulation can be used to discriminate between broad phone categories. We extract interpretable spatio-temporal primitive movements as recurring patterns in a data matrix of human speech articulation, i.e. representing the trajectories of vocal tract articulators over time. To this end, we propose a weakly-supervised learning method that attempts to find a part-based representation of the data in terms of recurring basis trajectory units (or primitives) and their corresponding activations over time. For each phone interval, we then derive a feature representation that captures the co-occurrences between the activations of the various bases over different time-lags. We show that this feature, derived entirely from activations of these primitive movements, is able to achieve a greater discrimination relative to using conventional features on an interval-based phone classification task. We discuss the implications of these findings in furthering our understanding of speech signal representations and the links between speech production and perception systems.
Collapse
Affiliation(s)
- Vikram Ramanarayanan
- Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles, CA - 90089
| | - Maarten Van Segbroeck
- Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles, CA - 90089
| | - Shrikanth S Narayanan
- Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles, CA - 90089
| |
Collapse
|
15
|
|
16
|
Parmiggiani A, Randazzo M, Maggiali M, Metta G, Elisei F, Bailly G. Design and Validation of a Talking Face for the iCub. INT J HUM ROBOT 2015. [DOI: 10.1142/s0219843615500267] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Recent developments in human–robot interaction show how the ability to communicate with people in a natural way is of great importance for artificial agents. The implementation of facial expressions has been found to significantly increase the interaction capabilities of humanoid robots. For speech, displaying a correct articulation with sound is mandatory to avoid audiovisual illusions like the McGurk effect (leading to comprehension errors) as well as to enhance the intelligibility in noisy conditions. This work describes the design, construction and testing of an animatronic talking face developed for the iCub robot. This talking head has an articulated jaw and four independent lip movements actuated by five motors. It is covered by a specially designed elastic tissue cover whose hemlines at the lips are attached to the motors via connecting linkages. The mechanical design and the control scheme have been evaluated by speech intelligibility in noise (SPIN) perceptual tests that demonstrate an absolute 10% intelligibility gain provided by the jaw and lip movements over the audio-only display.
Collapse
Affiliation(s)
- Alberto Parmiggiani
- iCub Facility, Fondazione Istituto Italiano di Tecnologia, Via Morego 30 16163, Genoa, Italy
| | - Marco Randazzo
- iCub Facility, Fondazione Istituto Italiano di Tecnologia, Via Morego 30 16163, Genoa, Italy
| | - Marco Maggiali
- iCub Facility, Fondazione Istituto Italiano di Tecnologia, Via Morego 30 16163, Genoa, Italy
| | - Giorgio Metta
- iCub Facility, Fondazione Istituto Italiano di Tecnologia, Via Morego 30 16163, Genoa, Italy
| | - Frederic Elisei
- GISPA Lab, Speech & Cognition Dpt., CNRS/Univ. Grenoble Alpes, France
| | - Gerard Bailly
- GISPA Lab, Speech & Cognition Dpt., CNRS/Univ. Grenoble Alpes, France
| |
Collapse
|
17
|
Gibert G, Olsen KN, Leung Y, Stevens CJ. Transforming an embodied conversational agent into an efficient talking head: from keyframe-based animation to multimodal concatenation synthesis. COMPUTATIONAL COGNITIVE SCIENCE 2015; 1:7. [PMID: 27980889 PMCID: PMC5125409 DOI: 10.1186/s40469-015-0007-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/22/2015] [Accepted: 08/30/2015] [Indexed: 12/04/2022]
Abstract
Background Virtual humans have become part of our everyday life (movies, internet, and computer games). Even though they are becoming more and more realistic, their speech capabilities are, most of the time, limited and not coherent and/or not synchronous with the corresponding acoustic signal. Methods We describe a method to convert a virtual human avatar (animated through key frames and interpolation) into a more naturalistic talking head. In fact, speech articulation cannot be accurately replicated using interpolation between key frames and talking heads with good speech capabilities are derived from real speech production data. Motion capture data are commonly used to provide accurate facial motion for visible speech articulators (jaw and lips) synchronous with acoustics. To access tongue trajectories (partially occluded speech articulator), electromagnetic articulography (EMA) is often used. We recorded a large database of phonetically-balanced English sentences with synchronous EMA, motion capture data, and acoustics. An articulatory model was computed on this database to recover missing data and to provide ‘normalized’ animation (i.e., articulatory) parameters. In addition, semi-automatic segmentation was performed on the acoustic stream. A dictionary of multimodal Australian English diphones was created. It is composed of the variation of the articulatory parameters between all the successive stable allophones. Results The avatar’s facial key frames were converted into articulatory parameters steering its speech articulators (jaw, lips and tongue). The speech production database was used to drive the Embodied Conversational Agent (ECA) and to enhance its speech capabilities. A Text-To-Auditory Visual Speech synthesizer was created based on the MaryTTS software and on the diphone dictionary derived from the speech production database. Conclusions We describe a method to transform an ECA with generic tongue model and animation by key frames into a talking head that displays naturalistic tongue, jaw and lip motions. Thanks to a multimodal speech production database, a Text-To-Auditory Visual Speech synthesizer drives the ECA’s facial movements enhancing its speech capabilities. Electronic supplementary material The online version of this article (doi:10.1186/s40469-015-0007-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Guillaume Gibert
- The MARCS Institute, University of Western Sydney, Locked Bag 1797, Penrith, NSW 2751 Australia ; INSERM U846, 18 avenue Doyen Lépine, 69500 Bron, France ; Stem Cell and Brain Research Institute, 69500 Bron, France ; Université de Lyon, Université Lyon 1, 69003 Lyon, France
| | - Kirk N Olsen
- The MARCS Institute, University of Western Sydney, Locked Bag 1797, Penrith, NSW 2751 Australia
| | - Yvonne Leung
- The MARCS Institute, University of Western Sydney, Locked Bag 1797, Penrith, NSW 2751 Australia
| | - Catherine J Stevens
- The MARCS Institute, University of Western Sydney, Locked Bag 1797, Penrith, NSW 2751 Australia
| |
Collapse
|
18
|
Determining functional units of tongue motion via graph-regularized sparse non-negative matrix factorization. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION : MICCAI ... INTERNATIONAL CONFERENCE ON MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION 2014. [PMID: 25485373 DOI: 10.1007/978-3-319-10470-6_19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register]
Abstract
Tongue motion during speech and swallowing involves synergies of locally deforming regions, or functional units. Motion clustering during tongue motion can be used to reveal the tongue's intrinsic functional organization. A novel matrix factorization and clustering method for tissues tracked using tagged magnetic resonance imaging (tMRI) is presented. Functional units are estimated using a graph-regularized sparse non-negative matrix factorization framework, learning latent building blocks and the corresponding weighting map from motion features derived from tissue displacements. Spectral clustering using the weighting map is then performed to determine the coherent regions--i.e., functional units--efined by the tongue motion. Two-dimensional image data is used to ver-fy that the proposed algorithm clusters the different types of images ac-urately. Three-dimensional tMRI data from five subjects carrying out simple non-speech/speech tasks are analyzed to show how the proposed approach defines a subject/task-specific functional parcellation of the tongue in localized regions.
Collapse
|
19
|
Affiliation(s)
- Bryan Gick
- Department of Linguistics, University of British Columbia Vancouver, BC, Canada ; Haskins Laboratories New Haven, CT, USA
| | - Ian Stavness
- Department of Computer Science, University of Saskatchewan Saskatoon, SK, Canada
| |
Collapse
|