1
|
Rogers HP, Hseu A, Kim J, Silberholz E, Jo S, Dorste A, Jenkins K. Voice as a Biomarker of Pediatric Health: A Scoping Review. CHILDREN (BASEL, SWITZERLAND) 2024; 11:684. [PMID: 38929263 PMCID: PMC11201680 DOI: 10.3390/children11060684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 05/24/2024] [Accepted: 05/29/2024] [Indexed: 06/28/2024]
Abstract
The human voice has the potential to serve as a valuable biomarker for the early detection, diagnosis, and monitoring of pediatric conditions. This scoping review synthesizes the current knowledge on the application of artificial intelligence (AI) in analyzing pediatric voice as a biomarker for health. The included studies featured voice recordings from pediatric populations aged 0-17 years, utilized feature extraction methods, and analyzed pathological biomarkers using AI models. Data from 62 studies were extracted, encompassing study and participant characteristics, recording sources, feature extraction methods, and AI models. Data from 39 models across 35 studies were evaluated for accuracy, sensitivity, and specificity. The review showed a global representation of pediatric voice studies, with a focus on developmental, respiratory, speech, and language conditions. The most frequently studied conditions were autism spectrum disorder, intellectual disabilities, asphyxia, and asthma. Mel-Frequency Cepstral Coefficients were the most utilized feature extraction method, while Support Vector Machines were the predominant AI model. The analysis of pediatric voice using AI demonstrates promise as a non-invasive, cost-effective biomarker for a broad spectrum of pediatric conditions. Further research is necessary to standardize the feature extraction methods and AI models utilized for the evaluation of pediatric voice as a biomarker for health. Standardization has significant potential to enhance the accuracy and applicability of these tools in clinical settings across a variety of conditions and voice recording types. Further development of this field has enormous potential for the creation of innovative diagnostic tools and interventions for pediatric populations globally.
Collapse
Affiliation(s)
- Hannah Paige Rogers
- Department of Cardiology, Boston Children’s Hospital, Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, USA
| | - Anne Hseu
- Department of Otolaryngology, Boston Children’s Hospital, 333 Longwood Ave, Boston, MA 02115, USA
| | - Jung Kim
- Department of Pediatrics, Boston Children’s Hospital, Boston, MA 02115, USA
| | | | - Stacy Jo
- Department of Otolaryngology, Boston Children’s Hospital, 333 Longwood Ave, Boston, MA 02115, USA
| | - Anna Dorste
- Boston Children’s Hospital, 300 Longwood Avenue, Boston, MA 02115, USA
| | - Kathy Jenkins
- Department of Cardiology, Boston Children’s Hospital, Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, USA
| |
Collapse
|
2
|
Huang D, Yu D, Zeng Y, Song X, Pan L, He J, Ren L, Yang J, Lu H, Wang W. Generalized Camera-Based Infant Sleep-Wake Monitoring in NICUs: A Multi-Center Clinical Trial. IEEE J Biomed Health Inform 2024; 28:3015-3028. [PMID: 38446652 DOI: 10.1109/jbhi.2024.3371687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2024]
Abstract
The infant sleep-wake behavior is an essential indicator of physiological and neurological system maturity, the circadian transition of which is important for evaluating the recovery of preterm infants from inadequate physiological function and cognitive disorders. Recently, camera-based infant sleep-wake monitoring has been investigated, but the challenges of generalization caused by variance in infants and clinical environments are not addressed for this application. In this paper, we conducted a multi-center clinical trial at four hospitals to improve the generalization of camera-based infant sleep-wake monitoring. Using the face videos of 64 term and 39 preterm infants recorded in NICUs, we proposed a novel sleep-wake classification strategy, called consistent deep representation constraint (CDRC), that forces the convolutional neural network (CNN) to make consistent predictions for the samples from different conditions but with the same label, to address the variances caused by infants and environments. The clinical validation shows that by using CDRC, all CNN backbones obtain over 85% accuracy, sensitivity, and specificity in both the cross-age and cross-environment experiments, improving the ones without CDRC by almost 15% in all metrics. This demonstrates that by improving the consistency of the deep representation of samples with the same state, we can significantly improve the generalization of infant sleep-wake classification.
Collapse
|
3
|
Khalilzad Z, Tadj C. Use of psychoacoustic spectrum warping, decision template fusion, and neighborhood component analysis in newborn cry diagnostic systems. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:901-914. [PMID: 38310608 DOI: 10.1121/10.0024618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 01/10/2024] [Indexed: 02/06/2024]
Abstract
Dealing with newborns' health is a delicate matter since they cannot express needs, and crying does not reflect their condition. Although newborn cries have been studied for various purposes, there is no prior research on distinguishing a certain pathology from other pathologies so far. Here, an unsophisticated framework is proposed for the study of septic newborns amid a collective of other pathologies. The cry was analyzed with music inspired and speech processing inspired features. Furthermore, neighborhood component analysis (NCA) feature selection was employed with two goals: (i) Exploring how the elements of each feature set contributed to classification outcome; (ii) investigating to what extent the feature space could be compacted. The attained results showed success of both experiments introduced in this study, with 88.66% for the decision template fusion (DTF) technique and a consistent enhancement in comparison to all feature sets in terms of accuracy and 86.22% for the NCA feature selection method by drastically downsizing the feature space from 86 elements to only 6 elements. The achieved results showed great potential for identifying a certain pathology from other pathologies that may have similar effects on the cry patterns as well as proving the success of the proposed framework.
Collapse
Affiliation(s)
- Zahra Khalilzad
- Department of Electrical Engineering, École de Technologie Supérieur, Université du Québec, Montréal, Québec H3C 1K3, Canada
| | - Chakib Tadj
- Department of Electrical Engineering, École de Technologie Supérieur, Université du Québec, Montréal, Québec H3C 1K3, Canada
| |
Collapse
|
4
|
Zayed Y, Hasasneh A, Tadj C. Infant Cry Signal Diagnostic System Using Deep Learning and Fused Features. Diagnostics (Basel) 2023; 13:2107. [PMID: 37371002 DOI: 10.3390/diagnostics13122107] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 06/09/2023] [Accepted: 06/13/2023] [Indexed: 06/29/2023] Open
Abstract
Early diagnosis of medical conditions in infants is crucial for ensuring timely and effective treatment. However, infants are unable to verbalize their symptoms, making it difficult for healthcare professionals to accurately diagnose their conditions. Crying is often the only way for infants to communicate their needs and discomfort. In this paper, we propose a medical diagnostic system for interpreting infants' cry audio signals (CAS) using a combination of different audio domain features and deep learning (DL) algorithms. The proposed system utilizes a dataset of labeled audio signals from infants with specific pathologies. The dataset includes two infant pathologies with high mortality rates, neonatal respiratory distress syndrome (RDS), sepsis, and crying. The system employed the harmonic ratio (HR) as a prosodic feature, the Gammatone frequency cepstral coefficients (GFCCs) as a cepstral feature, and image-based features through the spectrogram which are extracted using a convolution neural network (CNN) pretrained model and fused with the other features to benefit multiple domains in improving the classification rate and the accuracy of the model. The different combination of the fused features is then fed into multiple machine learning algorithms including random forest (RF), support vector machine (SVM), and deep neural network (DNN) models. The evaluation of the system using the accuracy, precision, recall, F1-score, confusion matrix, and receiver operating characteristic (ROC) curve, showed promising results for the early diagnosis of medical conditions in infants based on the crying signals only, where the system achieved the highest accuracy of 97.50% using the combination of the spectrogram, HR, and GFCC through the deep learning process. The finding demonstrated the importance of fusing different audio features, especially the spectrogram, through the learning process rather than a simple concatenation and the use of deep learning algorithms in extracting sparsely represented features that can be used later on in the classification problem, which improves the separation between different infants' pathologies. The results outperformed the published benchmark paper by improving the classification problem to be multiclassification (RDS, sepsis, and healthy), investigating a new type of feature, which is the spectrogram, using a new feature fusion technique, which is fusion, through the learning process using the deep learning model.
Collapse
Affiliation(s)
- Yara Zayed
- Department of Natural, Engineering and Technology Sciences, Faculty of Graduate Studies, Arab American University, Ramallah P.O. Box 240, Palestine
| | - Ahmad Hasasneh
- Department of Natural, Engineering and Technology Sciences, Faculty of Graduate Studies, Arab American University, Ramallah P.O. Box 240, Palestine
| | - Chakib Tadj
- Department of Electrical Engineering, École de Technologie Supérieur, Université du Québec, Montréal, QC H3C 1K3, Canada
| |
Collapse
|
5
|
Khalilzad Z, Tadj C. Using CCA-Fused Cepstral Features in a Deep Learning-Based Cry Diagnostic System for Detecting an Ensemble of Pathologies in Newborns. Diagnostics (Basel) 2023; 13:diagnostics13050879. [PMID: 36900023 PMCID: PMC10000938 DOI: 10.3390/diagnostics13050879] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 02/14/2023] [Accepted: 02/21/2023] [Indexed: 03/02/2023] Open
Abstract
Crying is one of the means of communication for a newborn. Newborn cry signals convey precious information about the newborn's health condition and their emotions. In this study, cry signals of healthy and pathologic newborns were analyzed for the purpose of developing an automatic, non-invasive, and comprehensive Newborn Cry Diagnostic System (NCDS) that identifies pathologic newborns from healthy infants. For this purpose, Mel-frequency Cepstral Coefficients (MFCC) and Gammatone Frequency Cepstral Coefficients (GFCC) were extracted as features. These feature sets were also combined and fused through Canonical Correlation Analysis (CCA), which provides a novel manipulation of the features that have not yet been explored in the literature on NCDS designs, to the best of our knowledge. All the mentioned feature sets were fed to the Support Vector Machine (SVM) and Long Short-term Memory (LSTM). Furthermore, two Hyperparameter optimization methods, Bayesian and grid search, were examined to enhance the system's performance. The performance of our proposed NCDS was evaluated with two different datasets of inspiratory and expiratory cries. The CCA fusion feature set using the LSTM classifier accomplished the best F-score in the study, with 99.86% for the inspiratory cry dataset. The best F-score regarding the expiratory cry dataset, 99.44%, belonged to the GFCC feature set employing the LSTM classifier. These experiments suggest the high potential and value of using the newborn cry signals in the detection of pathologies. The framework proposed in this study can be implemented as an early diagnostic tool for clinical studies and help in the identification of pathologic newborns.
Collapse
|
6
|
Rezaee K, Ghayoumi Zadeh H, Qi L, Rabiee H, Khosravi MR. Can you Understand why I am Crying? A Decision-making System for Classifying Infants' Cry Languages Based on deepSVM Model. ACM T ASIAN LOW-RESO 2023. [DOI: 10.1145/3579032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
There scientific and therapeutic advances in perinatology and neonatology have improved the survival prospects of preterm and extremely low birth weight infants. Infants' cries are a valuable noninvasive tool for monitoring their neurologic health, especially if they are premature. An automatic acoustic analysis and data mining are employed in this study to determine the discriminative features of preterm and full-term infant cries. The use of machine learning for recognizing sounds in a newborn's cry language has received less attention than previous methods for analyzing the sounds. Moreover, to extract appropriate features from infant cries, adequate knowledge and appropriate signal descriptors are required. Accordingly, to analyze infant cry language, we propose an approach that uses fractal descriptor to extract discriminant features from spectrograms of windowed signals, followed by iterative neighborhood component analysis (iNCA) to select appropriate features. Additionally, the improved deep support vector machine (deepSVM) is used to classify the infants' crying types and their meanings. The proposed method is verified using a newborn sound dataset. According to the classification of five types of crying perception based on various characteristics, 98.34% of all crying perceptions have been recognized. Although there are many classes examined, the feature extraction method based on the fractal method and our optimal classification have a much higher diagnostic accuracy compared to similar methods for analyzing baby crying language. The proposed method can overcome many problems associated with analyzing babies' crying sounds and understanding their language, like uncertainty and unusual errors in classification.
Collapse
Affiliation(s)
- Khosro Rezaee
- Department of Biomedical Engineering, Meybod University, Meybod, Iran
| | - Hossein Ghayoumi Zadeh
- Department of Electrical Engineering, Vali-E-Asr University of Rafsanjan, Rafsanjan, Iran
| | - Lianyong Qi
- College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
| | - Hamidreza Rabiee
- Department of Electrical Engineering, Karaj Branch, Islamic Azad University, Karaj, Iran
| | - Mohammad R. Khosravi
- Shandong Provincial University Laboratory for Protected Horticulture, Weifang University of Science and Technology, Weifang 262799, Shandong, China
| |
Collapse
|
7
|
Newborn Cry-Based Diagnostic System to Distinguish between Sepsis and Respiratory Distress Syndrome Using Combined Acoustic Features. Diagnostics (Basel) 2022; 12:diagnostics12112802. [PMID: 36428865 PMCID: PMC9689015 DOI: 10.3390/diagnostics12112802] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Revised: 11/05/2022] [Accepted: 11/11/2022] [Indexed: 11/18/2022] Open
Abstract
Crying is the only means of communication for a newborn baby with its surrounding environment, but it also provides significant information about the newborn's health, emotions, and needs. The cries of newborn babies have long been known as a biomarker for the diagnosis of pathologies. However, to the best of our knowledge, exploring the discrimination of two pathology groups by means of cry signals is unprecedented. Therefore, this study aimed to identify septic newborns with Neonatal Respiratory Distress Syndrome (RDS) by employing the Machine Learning (ML) methods of Multilayer Perceptron (MLP) and Support Vector Machine (SVM). Furthermore, the cry signal was analyzed from the following two different perspectives: 1) the musical perspective by studying the spectral feature set of Harmonic Ratio (HR), and 2) the speech processing perspective using the short-term feature set of Gammatone Frequency Cepstral Coefficients (GFCCs). In order to assess the role of employing features from both short-term and spectral modalities in distinguishing the two pathology groups, they were fused in one feature set named the combined features. The hyperparameters (HPs) of the implemented ML approaches were fine-tuned to fit each experiment. Finally, by normalizing and fusing the features originating from the two modalities, the overall performance of the proposed design was improved across all evaluation measures, achieving accuracies of 92.49% and 95.3% by the MLP and SVM classifiers, respectively. The MLP classifier was outperformed in terms of all evaluation measures presented in this study, except for the Area Under Curve of Receiver Operator Characteristics (AUC-ROC), which signifies the ability of the proposed design in class separation. The achieved results highlighted the role of combining features from different levels and modalities for a more powerful analysis of the cry signals, as well as including a neural network (NN)-based classifier. Consequently, attaining a 95.3% accuracy for the separation of two entangled pathology groups of RDS and sepsis elucidated the promising potential for further studies with larger datasets and more pathology groups.
Collapse
|
8
|
Khalilzad Z, Kheddache Y, Tadj C. An Entropy-Based Architecture for Detection of Sepsis in Newborn Cry Diagnostic Systems. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1194. [PMID: 36141080 PMCID: PMC9498202 DOI: 10.3390/e24091194] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 08/18/2022] [Accepted: 08/22/2022] [Indexed: 06/16/2023]
Abstract
The acoustic characteristics of cries are an exhibition of an infant's health condition and these characteristics have been acknowledged as indicators for various pathologies. This study focused on the detection of infants suffering from sepsis by developing a simplified design using acoustic features and conventional classifiers. The features for the proposed framework were Mel-frequency Cepstral Coefficients (MFCC), Spectral Entropy Cepstral Coefficients (SENCC) and Spectral Centroid Cepstral Coefficients (SCCC), which were classified through K-nearest Neighborhood (KNN) and Support Vector Machine (SVM) classification methods. The performance of the different combinations of the feature sets was also evaluated based on several measures such as accuracy, F1-score and Matthews Correlation Coefficient (MCC). Bayesian Hyperparameter Optimization (BHPO) was employed to tailor the classifiers uniquely to fit each experiment. The proposed methodology was tested on two datasets of expiratory cries (EXP) and voiced inspiratory cries (INSV). The highest accuracy and F-score were 89.99% and 89.70%, respectively. This framework also implemented a novel feature selection method based on Fuzzy Entropy (FE) as a final experiment. By employing FE, the number of features was reduced by more than 40%, whereas the evaluation measures were not hindered for the EXP dataset and were even enhanced for the INSV dataset. Therefore, it was deduced through these experiments that an entropy-based framework is successful for identifying sepsis in neonates and has the advantage of achieving high performance with conventional machine learning (ML) approaches, which makes it a reliable means for the early diagnosis of sepsis in deprived areas of the world.
Collapse
|
9
|
Lahmiri S, Tadj C, Gargour C. Nonlinear Statistical Analysis of Normal and Pathological Infant Cry Signals in Cepstrum Domain by Multifractal Wavelet Leaders. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1166. [PMID: 36010830 PMCID: PMC9407617 DOI: 10.3390/e24081166] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/25/2021] [Revised: 04/06/2022] [Accepted: 08/19/2022] [Indexed: 06/15/2023]
Abstract
Multifractal behavior in the cepstrum representation of healthy and unhealthy infant cry signals is examined by means of wavelet leaders and compared using the Student t-test. The empirical results show that both expiration and inspiration signals exhibit clear evidence of multifractal properties under healthy and unhealthy conditions. In addition, expiration and inspiration signals exhibit more complexity under healthy conditions than under unhealthy conditions. Furthermore, distributions of multifractal characteristics are different across healthy and unhealthy conditions. Hence, this study improves the understanding of infant crying by providing a complete description of its intrinsic dynamics to better evaluate its health status.
Collapse
Affiliation(s)
- Salim Lahmiri
- Department of Supply Chain and Business Technology Management, John Molson School of Business, Concordia University, Montreal, QC H3G 1M8, Canada
- Department of Electrical Engineering, École de Technologie Supérieure, Montreal, QC H3C 1K3, Canada
| | - Chakib Tadj
- Department of Electrical Engineering, École de Technologie Supérieure, Montreal, QC H3C 1K3, Canada
| | - Christian Gargour
- Department of Electrical Engineering, École de Technologie Supérieure, Montreal, QC H3C 1K3, Canada
| |
Collapse
|
10
|
|
11
|
ZhuParris A, Kruizinga MD, van Gent M, Dessing E, Exadaktylos V, Doll RJ, Stuurman FE, Driessen GA, Cohen AF. Development and Technical Validation of a Smartphone-Based Cry Detection Algorithm. Front Pediatr 2021; 9:651356. [PMID: 33928059 PMCID: PMC8076575 DOI: 10.3389/fped.2021.651356] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Accepted: 03/15/2021] [Indexed: 11/13/2022] Open
Abstract
Introduction: The duration and frequency of crying of an infant can be indicative of its health. Manual tracking and labeling of crying is laborious, subjective, and sometimes inaccurate. The aim of this study was to develop and technically validate a smartphone-based algorithm able to automatically detect crying. Methods: For the development of the algorithm a training dataset containing 897 5-s clips of crying infants and 1,263 clips of non-crying infants and common domestic sounds was assembled from various online sources. OpenSMILE software was used to extract 1,591 audio features per audio clip. A random forest classifying algorithm was fitted to identify crying from non-crying in each audio clip. For the validation of the algorithm, an independent dataset consisting of real-life recordings of 15 infants was used. A 29-min audio clip was analyzed repeatedly and under differing circumstances to determine the intra- and inter- device repeatability and robustness of the algorithm. Results: The algorithm obtained an accuracy of 94% in the training dataset and 99% in the validation dataset. The sensitivity in the validation dataset was 83%, with a specificity of 99% and a positive- and negative predictive value of 75 and 100%, respectively. Reliability of the algorithm appeared to be robust within- and across devices, and the performance was robust to distance from the sound source and barriers between the sound source and the microphone. Conclusion: The algorithm was accurate in detecting cry duration and was robust to various changes in ambient settings.
Collapse
Affiliation(s)
| | - Matthijs D Kruizinga
- Centre for Human Drug Research, Leiden, Netherlands.,Juliana Children's Hospital, Haga Teaching Hospital, Hague, Netherlands.,Leiden University Medical Centre, Leiden, Netherlands
| | - Max van Gent
- Centre for Human Drug Research, Leiden, Netherlands.,Juliana Children's Hospital, Haga Teaching Hospital, Hague, Netherlands
| | - Eva Dessing
- Centre for Human Drug Research, Leiden, Netherlands.,Juliana Children's Hospital, Haga Teaching Hospital, Hague, Netherlands
| | | | | | - Frederik E Stuurman
- Centre for Human Drug Research, Leiden, Netherlands.,Leiden University Medical Centre, Leiden, Netherlands
| | - Gertjan A Driessen
- Juliana Children's Hospital, Haga Teaching Hospital, Hague, Netherlands.,Department of Pediatrics, Maastricht University Medical Centre, Maastricht, Netherlands
| | - Adam F Cohen
- Centre for Human Drug Research, Leiden, Netherlands.,Leiden University Medical Centre, Leiden, Netherlands
| |
Collapse
|