1
|
Jeng FC, Matzdorf K, Hickman KL, Bauer SW, Carriero AE, McDonald K, Lin TH, Wang CY. Advancing Auditory Processing by Detecting Frequency-Following Responses Through a Specialized Machine Learning Model. Percept Mot Skills 2024; 131:417-431. [PMID: 38153030 DOI: 10.1177/00315125231225767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2023]
Abstract
In this study, we explore the feasibility and performance of detecting scalp-recorded frequency-following responses (FFRs) with a specialized machine learning (ML) model. By leveraging the strengths of feature extraction of the source separation non-negative matrix factorization (SSNMF) algorithm and its adeptness in handling limited training data, we adapted the SSNMF algorithm into a specialized ML model with a hybrid architecture to enhance FFR detection amidst background noise. We recruited 40 adults with normal hearing and evoked their scalp recorded FFRs using the English vowel/i/with a rising pitch contour. The model was trained on FFR-present and FFR-absent conditions, and its performance was evaluated using sensitivity, specificity, efficiency, false-positive rate, and false-negative rate metrics. This study revealed that the specialized SSNMF model achieved heightened sensitivity, specificity, and efficiency in detecting FFRs as the number of recording sweeps increased. Sensitivity exceeded 80% at 500 sweeps and maintained over 89% from 1000 sweeps onwards. Similarly, specificity and efficiency also improved rapidly with increasing sweeps. The progressively enhanced sensitivity, specificity, and efficiency of this specialized ML model underscore its practicality and potential for broader applications. These findings have immediate implications for FFR research and clinical use, while paving the way for further advancements in the assessment of auditory processing.
Collapse
Affiliation(s)
- Fuh-Cherng Jeng
- Communication Sciences and Disorders, Ohio University, Athens, OH, USA
- Communication Sciences and Disorders, Asia University, Taichung, Taiwan
| | - Katie Matzdorf
- Communication Sciences and Disorders, Ohio University, Athens, OH, USA
| | - Kassy L Hickman
- Communication Sciences and Disorders, Ohio University, Athens, OH, USA
| | - Sydney W Bauer
- Communication Sciences and Disorders, Ohio University, Athens, OH, USA
| | - Amanda E Carriero
- Communication Sciences and Disorders, Ohio University, Athens, OH, USA
| | - Kalyn McDonald
- Communication Sciences and Disorders, Ohio University, Athens, OH, USA
| | - Tzu-Hao Lin
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
| | - Ching-Yuan Wang
- Department of Otolaryngology-HNS, China Medical University Hospital, Taichung, Taiwan
| |
Collapse
|
2
|
Zhu Y, Li C, Hendry C, Glass J, Canseco-Gonzalez E, Pitts MA, Dykstra AR. Isolating Neural Signatures of Conscious Speech Perception with a No-Report Sine-Wave Speech Paradigm. J Neurosci 2024; 44:e0145232023. [PMID: 38191569 PMCID: PMC10883607 DOI: 10.1523/jneurosci.0145-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 11/21/2023] [Accepted: 12/21/2023] [Indexed: 01/10/2024] Open
Abstract
Identifying neural correlates of conscious perception is a fundamental endeavor of cognitive neuroscience. Most studies so far have focused on visual awareness along with trial-by-trial reports of task-relevant stimuli, which can confound neural measures of perceptual awareness with postperceptual processing. Here, we used a three-phase sine-wave speech paradigm that dissociated between conscious speech perception and task relevance while recording EEG in humans of both sexes. Compared with tokens perceived as noise, physically identical sine-wave speech tokens that were perceived as speech elicited a left-lateralized, near-vertex negativity, which we interpret as a phonological version of a perceptual awareness negativity. This response appeared between 200 and 300 ms after token onset and was not present for frequency-flipped control tokens that were never perceived as speech. In contrast, the P3b elicited by task-irrelevant tokens did not significantly differ when the tokens were perceived as speech versus noise and was only enhanced for tokens that were both perceived as speech and relevant to the task. Our results extend the findings from previous studies on visual awareness and speech perception and suggest that correlates of conscious perception, across types of conscious content, are most likely to be found in midlatency negative-going brain responses in content-specific sensory areas.
Collapse
Affiliation(s)
- Yunkai Zhu
- Department of Biomedical Engineering, University of Miami, Coral Gables, Florida 33143
| | - Charlotte Li
- Department of Psychology, Reed College, Portland, Oregon 97202
| | - Camille Hendry
- Department of Psychology, Reed College, Portland, Oregon 97202
| | - James Glass
- Department of Psychology, Reed College, Portland, Oregon 97202
| | | | - Michael A Pitts
- Department of Psychology, Reed College, Portland, Oregon 97202
| | - Andrew R Dykstra
- Department of Biomedical Engineering, University of Miami, Coral Gables, Florida 33143
| |
Collapse
|
3
|
Giordano AT, Jeng FC, Black TR, Bauer SW, Carriero AE, McDonald K, Lin TH, Wang CY. Effects of Silent Intervals on the Extraction of Human Frequency-Following Responses Using Non-Negative Matrix Factorization. Percept Mot Skills 2023; 130:1834-1851. [PMID: 37534595 DOI: 10.1177/00315125231191303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/04/2023]
Abstract
Source-Separation Non-Negative Matrix Factorization (SSNMF) is a mathematical algorithm recently developed to extract scalp-recorded frequency-following responses (FFRs) from noise. Despite its initial success, the effects of silent intervals on algorithm performance remain undetermined. Our purpose in this study was to determine the effects of silent intervals on the extraction of FFRs, which are electrophysiological responses that are commonly used to evaluate auditory processing and neuroplasticity in the human brain. We used an English vowel /i/ with a rising frequency contour to evoke FFRs in 23 normal-hearing adults. The stimulus had a duration of 150 ms, while the silent interval between the onset of one stimulus and the offset of the next one was also 150 ms. We computed FFR Enhancement and Noise Residue to estimate algorithm performance, while silent intervals were either included (i.e., the WithSI condition) or excluded (i.e., the WithoutSI condition) in our analysis. The FFR Enhancements and Noise Residues obtained in the WithoutSI condition were significantly better (p < .05) than those obtained in the WithSI condition. On average, the exclusion of silent intervals produced a 11.78% increment in FFR Enhancement and a 20.69% decrement in Noise Residue. These results not only quantify the effects of silent intervals on the extraction of human FFRs, but also provide recommendations for designing and improving the SSNMF algorithm in future research.
Collapse
Affiliation(s)
- Allison T Giordano
- Communication Sciences and Disorders, Ohio University, Athens, Ohio, USA
| | - Fuh-Cherng Jeng
- Communication Sciences and Disorders, Ohio University, Athens, Ohio, USA
| | - Taylor R Black
- Communication Sciences and Disorders, Ohio University, Athens, Ohio, USA
| | - Sydney W Bauer
- Communication Sciences and Disorders, Ohio University, Athens, Ohio, USA
| | - Amanda E Carriero
- Communication Sciences and Disorders, Ohio University, Athens, Ohio, USA
| | - Kalyn McDonald
- Communication Sciences and Disorders, Ohio University, Athens, Ohio, USA
| | - Tzu-Hao Lin
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
| | - Ching-Yuan Wang
- Department of Otolaryngology-HNS, China Medical University Hospital, Taichung, Taiwan
| |
Collapse
|
4
|
Xu C, Cheng FY, Medina S, Eng E, Gifford R, Smith S. Objective discrimination of bimodal speech using frequency following responses. Hear Res 2023; 437:108853. [PMID: 37441879 DOI: 10.1016/j.heares.2023.108853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 07/03/2023] [Accepted: 07/08/2023] [Indexed: 07/15/2023]
Abstract
Bimodal hearing, in which a contralateral hearing aid is combined with a cochlear implant (CI), provides greater speech recognition benefits than using a CI alone. Factors predicting individual bimodal patient success are not fully understood. Previous studies have shown that bimodal benefits may be driven by a patient's ability to extract fundamental frequency (f0) and/or temporal fine structure cues (e.g., F1). Both of these features may be represented in frequency following responses (FFR) to bimodal speech. Thus, the goals of this study were to: 1) parametrically examine neural encoding of f0 and F1 in simulated bimodal speech conditions; 2) examine objective discrimination of FFRs to bimodal speech conditions using machine learning; 3) explore whether FFRs are predictive of perceptual bimodal benefit. Three vowels (/ε/, /i/, and /ʊ/) with identical f0 were manipulated by a vocoder (right ear) and low-pass filters (left ear) to create five bimodal simulations for evoking FFRs: Vocoder-only, Vocoder +125 Hz, Vocoder +250 Hz, Vocoder +500 Hz, and Vocoder +750 Hz. Perceptual performance on the BKB-SIN test was also measured using the same five configurations. Results suggested that neural representation of f0 and F1 FFR components were enhanced with increasing acoustic bandwidth in the simulated "non-implanted" ear. As spectral differences between vowels emerged in the FFRs with increased acoustic bandwidth, FFRs were more accurately classified and discriminated using a machine learning algorithm. Enhancement of f0 and F1 neural encoding with increasing bandwidth were collectively predictive of perceptual bimodal benefit on a speech-in-noise task. Given these results, FFR may be a useful tool to objectively assess individual variability in bimodal hearing.
Collapse
Affiliation(s)
- Can Xu
- Department of Speech, Language, and Hearing Sciences, University of Texas at Austin, 2504A Whitis Ave. (A1100), Austin 78712-0114, TX, USA
| | - Fan-Yin Cheng
- Department of Speech, Language, and Hearing Sciences, University of Texas at Austin, 2504A Whitis Ave. (A1100), Austin 78712-0114, TX, USA
| | - Sarah Medina
- Department of Speech, Language, and Hearing Sciences, University of Texas at Austin, 2504A Whitis Ave. (A1100), Austin 78712-0114, TX, USA
| | - Erica Eng
- Department of Speech, Language, and Hearing Sciences, University of Texas at Austin, 2504A Whitis Ave. (A1100), Austin 78712-0114, TX, USA
| | - René Gifford
- Department of Speech, Language, and Hearing Sciences, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Spencer Smith
- Department of Speech, Language, and Hearing Sciences, University of Texas at Austin, 2504A Whitis Ave. (A1100), Austin 78712-0114, TX, USA.
| |
Collapse
|
5
|
Rizzi R, Bidelman GM. Duplex perception reveals brainstem auditory representations are modulated by listeners' ongoing percept for speech. Cereb Cortex 2023; 33:10076-10086. [PMID: 37522248 PMCID: PMC10502779 DOI: 10.1093/cercor/bhad266] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 06/27/2023] [Accepted: 07/10/2023] [Indexed: 08/01/2023] Open
Abstract
So-called duplex speech stimuli with perceptually ambiguous spectral cues to one ear and isolated low- versus high-frequency third formant "chirp" to the opposite ear yield a coherent percept supporting their phonetic categorization. Critically, such dichotic sounds are only perceived categorically upon binaural integration. Here, we used frequency-following responses (FFRs), scalp-recorded potentials reflecting phase-locked subcortical activity, to investigate brainstem responses to fused speech percepts and to determine whether FFRs reflect binaurally integrated category-level representations. We recorded FFRs to diotic and dichotic stop-consonants (/da/, /ga/) that either did or did not require binaural fusion to properly label along with perceptually ambiguous sounds without clear phonetic identity. Behaviorally, listeners showed clear categorization of dichotic speech tokens confirming they were heard with a fused, phonetic percept. Neurally, we found FFRs were stronger for categorically perceived speech relative to category-ambiguous tokens but also differentiated phonetic categories for both diotically and dichotically presented speech sounds. Correlations between neural and behavioral data further showed FFR latency predicted the degree to which listeners labeled tokens as "da" versus "ga." The presence of binaurally integrated, category-level information in FFRs suggests human brainstem processing reflects a surprisingly abstract level of the speech code typically circumscribed to much later cortical processing.
Collapse
Affiliation(s)
- Rose Rizzi
- Department of Speech, Language, and Hearing Sciences, Indiana University, Bloomington, IN, United States
- Program in Neuroscience, Indiana University, Bloomington, IN, United States
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, United States
| | - Gavin M Bidelman
- Department of Speech, Language, and Hearing Sciences, Indiana University, Bloomington, IN, United States
- Program in Neuroscience, Indiana University, Bloomington, IN, United States
- Cognitive Science Program, Indiana University, Bloomington, IN, United States
| |
Collapse
|
6
|
Rizzi R, Bidelman GM. Duplex perception reveals brainstem auditory representations are modulated by listeners' ongoing percept for speech. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.09.540018. [PMID: 37214801 PMCID: PMC10197666 DOI: 10.1101/2023.05.09.540018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
So-called duplex speech stimuli with perceptually ambiguous spectral cues to one ear and isolated low- vs. high-frequency third formant "chirp" to the opposite ear yield a coherent percept supporting their phonetic categorization. Critically, such dichotic sounds are only perceived categorically upon binaural integration. Here, we used frequency-following responses (FFRs), scalp-recorded potentials reflecting phase-locked subcortical activity, to investigate brainstem responses to fused speech percepts and to determine whether FFRs reflect binaurally integrated category-level representations. We recorded FFRs to diotic and dichotic stop-consonants (/da/, /ga/) that either did or did not require binaural fusion to properly label along with perceptually ambiguous sounds without clear phonetic identity. Behaviorally, listeners showed clear categorization of dichotic speech tokens confirming they were heard with a fused, phonetic percept. Neurally, we found FFRs were stronger for categorically perceived speech relative to category-ambiguous tokens but also differentiated phonetic categories for both diotically and dichotically presented speech sounds. Correlations between neural and behavioral data further showed FFR latency predicted the degree to which listeners labeled tokens as "da" vs. "ga". The presence of binaurally integrated, category-level information in FFRs suggests human brainstem processing reflects a surprisingly abstract level of the speech code typically circumscribed to much later cortical processing.
Collapse
Affiliation(s)
- Rose Rizzi
- Department of Speech, Language, and Hearing Sciences, Indiana University, Bloomington, IN, USA
- Program in Neuroscience, Indiana University, Bloomington, IN, USA
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, USA
| | - Gavin M. Bidelman
- Department of Speech, Language, and Hearing Sciences, Indiana University, Bloomington, IN, USA
- Program in Neuroscience, Indiana University, Bloomington, IN, USA
- Cognitive Science Program, Indiana University, Bloomington, IN, USA
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, USA
| |
Collapse
|
7
|
Carter JA, Bidelman GM. Perceptual warping exposes categorical representations for speech in human brainstem responses. Neuroimage 2023; 269:119899. [PMID: 36720437 PMCID: PMC9992300 DOI: 10.1016/j.neuroimage.2023.119899] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 01/17/2023] [Accepted: 01/22/2023] [Indexed: 01/30/2023] Open
Abstract
The brain transforms continuous acoustic events into discrete category representations to downsample the speech signal for our perceptual-cognitive systems. Such phonetic categories are highly malleable, and their percepts can change depending on surrounding stimulus context. Previous work suggests these acoustic-phonetic mapping and perceptual warping of speech emerge in the brain no earlier than auditory cortex. Here, we examined whether these auditory-category phenomena inherent to speech perception occur even earlier in the human brain, at the level of auditory brainstem. We recorded speech-evoked frequency following responses (FFRs) during a task designed to induce more/less warping of listeners' perceptual categories depending on stimulus presentation order of a speech continuum (random, forward, backward directions). We used a novel clustered stimulus paradigm to rapidly record the high trial counts needed for FFRs concurrent with active behavioral tasks. We found serial stimulus order caused perceptual shifts (hysteresis) near listeners' category boundary confirming identical speech tokens are perceived differentially depending on stimulus context. Critically, we further show neural FFRs during active (but not passive) listening are enhanced for prototypical vs. category-ambiguous tokens and are biased in the direction of listeners' phonetic label even for acoustically-identical speech stimuli. These findings were not observed in the stimulus acoustics nor model FFR responses generated via a computational model of cochlear and auditory nerve transduction, confirming a central origin to the effects. Our data reveal FFRs carry category-level information and suggest top-down processing actively shapes the neural encoding and categorization of speech at subcortical levels. These findings suggest the acoustic-phonetic mapping and perceptual warping in speech perception occur surprisingly early along the auditory neuroaxis, which might aid understanding by reducing ambiguity inherent to the speech signal.
Collapse
Affiliation(s)
- Jared A Carter
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, USA; School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, USA; Division of Clinical Neuroscience, School of Medicine, Hearing Sciences - Scottish Section, University of Nottingham, Glasgow, Scotland, UK
| | - Gavin M Bidelman
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN, USA; Program in Neuroscience, Indiana University, Bloomington, IN, USA.
| |
Collapse
|
8
|
Lai J, Price CN, Bidelman GM. Brainstem speech encoding is dynamically shaped online by fluctuations in cortical α state. Neuroimage 2022; 263:119627. [PMID: 36122686 PMCID: PMC10017375 DOI: 10.1016/j.neuroimage.2022.119627] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 09/12/2022] [Indexed: 11/25/2022] Open
Abstract
Experimental evidence in animals demonstrates cortical neurons innervate subcortex bilaterally to tune brainstem auditory coding. Yet, the role of the descending (corticofugal) auditory system in modulating earlier sound processing in humans during speech perception remains unclear. Here, we measured EEG activity as listeners performed speech identification tasks in different noise backgrounds designed to tax perceptual and attentional processing. We hypothesized brainstem speech coding might be tied to attention and arousal states (indexed by cortical α power) that actively modulate the interplay of brainstem-cortical signal processing. When speech-evoked brainstem frequency-following responses (FFRs) were categorized according to cortical α states, we found low α FFRs in noise were weaker, correlated positively with behavioral response times, and were more "decodable" via neural classifiers. Our data provide new evidence for online corticofugal interplay in humans and establish that brainstem sensory representations are continuously yoked to (i.e., modulated by) the ebb and flow of cortical states to dynamically update perceptual processing.
Collapse
Affiliation(s)
- Jesyin Lai
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, USA; School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, USA; Diagnostic Imaging Department, St. Jude Children's Research Hospital, Memphis, TN, USA.
| | - Caitlin N Price
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, USA; School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, USA; Department of Audiology and Speech Pathology, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Gavin M Bidelman
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, USA; School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, USA; Department of Speech, Language and Hearing Sciences, Indiana University, 2631 East Discovery Parkway, Bloomington, IN 47408, USA; Program in Neuroscience, Indiana University, 1101 E 10th St, Bloomington, IN 47405, USA.
| |
Collapse
|
9
|
Jeng FC, Jeng YS. Implementation of Machine Learning on Human Frequency-Following Responses: A Tutorial. Semin Hear 2022; 43:251-274. [PMID: 36313046 PMCID: PMC9605809 DOI: 10.1055/s-0042-1756219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
The frequency-following response (FFR) provides enriched information on how acoustic stimuli are processed in the human brain. Based on recent studies, machine learning techniques have demonstrated great utility in modeling human FFRs. This tutorial focuses on the fundamental principles, algorithmic designs, and custom implementations of several supervised models (linear regression, logistic regression, k -nearest neighbors, support vector machines) and an unsupervised model ( k -means clustering). Other useful machine learning tools (Markov chains, dimensionality reduction, principal components analysis, nonnegative matrix factorization, and neural networks) are discussed as well. Each model's applicability and its pros and cons are explained. The choice of a suitable model is highly dependent on the research question, FFR recordings, target variables, extracted features, and their data types. To promote understanding, an example project implemented in Python is provided, which demonstrates practical usage of several of the discussed models on a sample dataset of six FFR features and a target response label.
Collapse
Affiliation(s)
- Fuh-Cherng Jeng
- Communication Sciences and Disorders, Ohio University, Athens, Ohio
| | - Yu-Shiang Jeng
- Computer Science and Engineering, Ohio State University, Columbus, Ohio
| |
Collapse
|
10
|
Smith S. Translational Applications of Machine Learning in Auditory Electrophysiology. Semin Hear 2022; 43:240-250. [PMID: 36313047 PMCID: PMC9605807 DOI: 10.1055/s-0042-1756166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
Machine learning (ML) is transforming nearly every aspect of modern life including medicine and its subfields, such as hearing science. This article presents a brief conceptual overview of selected ML approaches and describes how these techniques are being applied to outstanding problems in hearing science, with a particular focus on auditory evoked potentials (AEPs). Two vignettes are presented in which ML is used to analyze subcortical AEP data. The first vignette demonstrates how ML can be used to determine if auditory learning has influenced auditory neurophysiologic function. The second vignette demonstrates how ML analysis of AEPs may be useful in determining whether hearing devices are optimized for discriminating speech sounds.
Collapse
Affiliation(s)
- Spencer Smith
- Department of Speech, Language, and Hearing Sciences, University of Texas at Austin, Austin, Texas
| |
Collapse
|