1
|
A Comparison Study of Deep Learning Methodologies for Music Emotion Recognition. SENSORS (BASEL, SWITZERLAND) 2024; 24:2201. [PMID: 38610412 PMCID: PMC11014202 DOI: 10.3390/s24072201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 03/20/2024] [Accepted: 03/26/2024] [Indexed: 04/14/2024]
Abstract
Classical machine learning techniques have dominated Music Emotion Recognition. However, improvements have slowed down due to the complex and time-consuming task of handcrafting new emotionally relevant audio features. Deep learning methods have recently gained popularity in the field because of their ability to automatically learn relevant features from spectral representations of songs, eliminating such necessity. Nonetheless, there are limitations, such as the need for large amounts of quality labeled data, a common problem in MER research. To understand the effectiveness of these techniques, a comparison study using various classical machine learning and deep learning methods was conducted. The results showed that using an ensemble of a Dense Neural Network and a Convolutional Neural Network architecture resulted in a state-of-the-art 80.20% F1 score, an improvement of around 5% considering the best baseline results, concluding that future research should take advantage of both paradigms, that is, combining handcrafted features with feature learning.
Collapse
|
2
|
Intra- and inter-brain coupling and activity dynamics during improvisational music therapy with a person with dementia: an explorative EEG-hyperscanning single case study. Front Psychol 2023; 14:1155732. [PMID: 37842703 PMCID: PMC10570426 DOI: 10.3389/fpsyg.2023.1155732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 09/06/2023] [Indexed: 10/17/2023] Open
Abstract
Objective Real-life research into the underlying neural dynamics of improvisational music therapy, used with various clinical populations, is largely lacking. This single case study explored within-session differences in musical features and in within- and between-brain coupling between a Person with Dementia (PwD) and a music therapist during a music therapy session. Methods Dual-EEG from a music therapist and a PwD (male, 31 years) was recorded. Note density, pulse clarity and synchronicity were extracted from audio-visual data. Three music therapists identified moments of interest and no interest (MOI/MONI) in two drum improvisations. The Integrative Coupling Index, reflecting time-lagged neural synchronization, and musical features were compared between the MOI and MONI. Results Between-brain coupling of 2 Hz activity was increased during the MOI, showing anteriority of the therapist's neural activity. Within-brain coupling for the PwD was stronger from frontal and central areas during the MOI, but within-brain coupling for the therapist was stronger during MONI. Differences in musical features indicated that both acted musically more similar to one another during the MOI. Conclusion Within-session differences in neural synchronization and musical features highlight the dynamic nature of music therapy. Significance The findings contribute to a better understanding of social and affective processes in the brain and (interactive) musical behaviors during specific moments in a real-life music therapy session. This may provide insights into the role of such moments for relational-therapeutic processes.
Collapse
|
3
|
Perceived rhythmic regularity is greater for song than speech: examining acoustic correlates of rhythmic regularity in speech and song. Front Psychol 2023; 14:1167003. [PMID: 37303916 PMCID: PMC10250601 DOI: 10.3389/fpsyg.2023.1167003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Accepted: 05/09/2023] [Indexed: 06/13/2023] Open
Abstract
Rhythm is a key feature of music and language, but the way rhythm unfolds within each domain differs. Music induces perception of a beat, a regular repeating pulse spaced by roughly equal durations, whereas speech does not have the same isochronous framework. Although rhythmic regularity is a defining feature of music and language, it is difficult to derive acoustic indices of the differences in rhythmic regularity between domains. The current study examined whether participants could provide subjective ratings of rhythmic regularity for acoustically matched (syllable-, tempo-, and contour-matched) and acoustically unmatched (varying in tempo, syllable number, semantics, and contour) exemplars of speech and song. We used subjective ratings to index the presence or absence of an underlying beat and correlated ratings with stimulus features to identify acoustic metrics of regularity. Experiment 1 highlighted that ratings based on the term "rhythmic regularity" did not result in consistent definitions of regularity across participants, with opposite ratings for participants who adopted a beat-based definition (song greater than speech), a normal-prosody definition (speech greater than song), or an unclear definition (no difference). Experiment 2 defined rhythmic regularity as how easy it would be to tap or clap to the utterances. Participants rated song as easier to clap or tap to than speech for both acoustically matched and unmatched datasets. Subjective regularity ratings from Experiment 2 illustrated that stimuli with longer syllable durations and with less spectral flux were rated as more rhythmically regular across domains. Our findings demonstrate that rhythmic regularity distinguishes speech from song and several key acoustic features can be used to predict listeners' perception of rhythmic regularity within and across domains as well.
Collapse
|
4
|
Diurnal fluctuations in musical preference. ROYAL SOCIETY OPEN SCIENCE 2021; 8:210885. [PMID: 34804568 PMCID: PMC8580447 DOI: 10.1098/rsos.210885] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 10/13/2021] [Indexed: 05/05/2023]
Abstract
The rhythm of human life is governed by diurnal cycles, as a result of endogenous circadian processes evolved to maximize biological fitness. Even complex aspects of daily life, such as affective states, exhibit systematic diurnal patterns which in turn influence behaviour. As a result, previous research has identified population-level diurnal patterns in affective preference for music. By analysing audio features from over two billion music streaming events on Spotify, we find that the music people listen to divides into five distinct time blocks corresponding to morning, afternoon, evening, night and late night/early morning. By integrating an artificial neural network with Spotify's API, we show a general awareness of diurnal preference in playlists, which is not present to the same extent for individual tracks. Our results demonstrate how music intertwines with our daily lives and highlight how even something as individual as musical preference is influenced by underlying diurnal patterns.
Collapse
|
5
|
Motor performance in violin bowing: Effects of attentional focus on acoustical, physiological and physical parameters of a sound-producing action. JOURNAL OF NEW MUSIC RESEARCH 2021; 50:428-446. [PMID: 35611362 PMCID: PMC7612762 DOI: 10.1080/09298215.2021.1978506] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Accepted: 09/03/2021] [Indexed: 06/15/2023]
Abstract
Violin bowing is a specialised sound-producing action, which may be affected by psychological performance techniques. In sport, attentional focus impacts motor performance, but limited evidence for this exists in music. We investigated effects of attentional focus on acoustical, physiological, and physical parameters of violin bowing in experienced and novice violinists. Attentional focus significantly affected spectral centroid, bow contact point consistency, shoulder muscle activity, and novices' violin sway. Performance was most improved when focusing on tactile sensations through the bow (somatic focus), compared to sound (external focus) or arm movement (internal focus). Implications for motor performance theory and pedagogy are discussed.
Collapse
|
6
|
Time Signature Detection: A Survey. SENSORS 2021; 21:s21196494. [PMID: 34640814 PMCID: PMC8512143 DOI: 10.3390/s21196494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Revised: 09/24/2021] [Accepted: 09/25/2021] [Indexed: 11/16/2022]
Abstract
This paper presents a thorough review of methods used in various research articles published in the field of time signature estimation and detection from 2003 to the present. The purpose of this review is to investigate the effectiveness of these methods and how they perform on different types of input signals (audio and MIDI). The results of the research have been divided into two categories: classical and deep learning techniques, and are summarized in order to make suggestions for future study. More than 110 publications from top journals and conferences written in English were reviewed, and each of the research selected was fully examined to demonstrate the feasibility of the approach used, the dataset, and accuracy obtained. Results of the studies analyzed show that, in general, the process of time signature estimation is a difficult one. However, the success of this research area could be an added advantage in a broader area of music genre classification using deep learning techniques. Suggestions for improved estimates and future research projects are also discussed.
Collapse
|
7
|
A Comparison of Human and Computational Melody Prediction Through Familiarity and Expertise. Front Psychol 2020; 11:557398. [PMID: 33362622 PMCID: PMC7756065 DOI: 10.3389/fpsyg.2020.557398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 11/13/2020] [Indexed: 11/16/2022] Open
Abstract
Melody prediction is an important aspect of music listening. The success of prediction, i.e., whether the next note played in a song is the same as the one predicted by the listener, depends on various factors. In the paper, we present two studies, where we assess how music familiarity and music expertise influence melody prediction in human listeners, and, expressed in appropriate data/algorithmic ways, computational models. To gather data on human listeners, we designed a melody prediction user study, where familiarity was controlled by two different music collections, while expertise was assessed by adapting the Music Sophistication Index instrument to Slovenian language. In the second study, we evaluated the melody prediction accuracy of computational melody prediction models. We evaluated two models, the SymCHM and the Implication-Realization model, which differ substantially in how they approach melody prediction. Our results show that both music familiarity and expertise affect the prediction accuracy of human listeners, as well as of computational models.
Collapse
|
8
|
Phrase-Level Modeling of Expression in Violin Performances. Front Psychol 2019; 10:776. [PMID: 31031671 PMCID: PMC6470278 DOI: 10.3389/fpsyg.2019.00776] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2018] [Accepted: 03/21/2019] [Indexed: 11/13/2022] Open
Abstract
Background: Expression is a key skill in music performance, and one that is difficult to address in music lessons. Computational models that learn from expert performances can help providing suggestions and feedback to students. Aim: We propose and analyze an approach to modeling variations in dynamics and note onset timing for solo violin pieces with the purpose of facilitating expressive performance learning in new pieces, for which no reference performance is available. Method: The method generates phrase–level predictions based on musical score information on the assumption that expressiveness is idiomatic, and thus influenced by similar–sounding melodies. Predictions were evaluated numerically using three different datasets and against note–level machine–learning models, and also perceptually by listeners, who were presented to synthesized versions of musical excerpts, and asked to choose the most human–sounding one. Some of the presented excerpts were synthesized to reflect the variations in dynamics and timing predicted by the model, whereas others were shaped to reflect the dynamics and timing of an actual expert performance, and a third group was presented with no expressive variations. Results: surprisingly, none of the three synthesized versions was consistently selected as human–like nor preferred with statistical significance by listeners. Possible interpretations of these results include the fact that the melodies might have been impossible to interpret outside their musical context, or that expressive features that were left out of the modeling such as note articulation and vibrato are, in fact, essential to the perception of expression in violin performance. Positive feedback by some listeners toward the modeled melodies in a blind setting indicate that the modeling approach was capable of generating appropriate renditions at least for a subset of the data. Numerically, performance in phrase–level suffers a small degradation if compared to note–level, but produces predictions easier to interpret visually, thus more useful in a pedagogical setting.
Collapse
|
9
|
Correspondences Between Music and Involuntary Human Micromotion During Standstill. Front Psychol 2018; 9:1382. [PMID: 30131742 PMCID: PMC6090462 DOI: 10.3389/fpsyg.2018.01382] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Accepted: 07/17/2018] [Indexed: 11/13/2022] Open
Abstract
The relationships between human body motion and music have been the focus of several studies characterizing the correspondence between voluntary motion and various sound features. The study of involuntary movement to music, however, is still scarce. Insight into crucial aspects of music cognition, as well as characterization of the vestibular and sensorimotor systems could be largely improved through a description of the underlying links between music and involuntary movement. This study presents an analysis aimed at quantifying involuntary body motion of a small magnitude (micromotion) during standstill, as well as assessing the correspondences between such micromotion and different sound features of the musical stimuli: pulse clarity, amplitude, and spectral centroid. A total of 71 participants were asked to stand as still as possible for 6 min while being presented with alternating silence and music stimuli: Electronic Dance Music (EDM), Classical Indian music, and Norwegian fiddle music (Telespringar). The motion of each participant's head was captured with a marker-based, infrared optical system. Differences in instantaneous position data were computed for each participant and the resulting time series were analyzed through cross-correlation to evaluate the delay between motion and musical features. The mean quantity of motion (QoM) was found to be highest across participants during the EDM condition. This musical genre is based on a clear pulse and rhythmic pattern, and it was also shown that pulse clarity was the metric that had the most significant effect in induced vertical motion across conditions. Correspondences were also found between motion and both brightness and loudness, providing some evidence of anticipation and reaction to the music. Overall, the proposed analysis techniques provide quantitative data and metrics on the correspondences between micromotion and music, with the EDM stimulus producing the clearest music-induced motion patterns. The analysis and results from this study are compatible with embodied music cognition and sensorimotor synchronization theories, and provide further evidence of the movement inducing effects of groove-related music features and human response to sound stimuli. Further work with larger data sets, and a wider range of stimuli, is necessary to produce conclusive findings on the subject.
Collapse
|
10
|
Toward Studying Music Cognition with Information Retrieval Techniques: Lessons Learned from the OpenMIIR Initiative. Front Psychol 2017; 8:1255. [PMID: 28824478 PMCID: PMC5541010 DOI: 10.3389/fpsyg.2017.01255] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2016] [Accepted: 07/10/2017] [Indexed: 12/05/2022] Open
Abstract
As an emerging sub-field of music information retrieval (MIR), music imagery information retrieval (MIIR) aims to retrieve information from brain activity recorded during music cognition–such as listening to or imagining music pieces. This is a highly inter-disciplinary endeavor that requires expertise in MIR as well as cognitive neuroscience and psychology. The OpenMIIR initiative strives to foster collaborations between these fields to advance the state of the art in MIIR. As a first step, electroencephalography (EEG) recordings of music perception and imagination have been made publicly available, enabling MIR researchers to easily test and adapt their existing approaches for music analysis like fingerprinting, beat tracking or tempo estimation on this new kind of data. This paper reports on first results of MIIR experiments using these OpenMIIR datasets and points out how these findings could drive new research in cognitive neuroscience.
Collapse
|
11
|
Qualitative and Quantitative Features of Music Reported to Support Peak Mystical Experiences during Psychedelic Therapy Sessions. Front Psychol 2017; 8:1238. [PMID: 28790944 PMCID: PMC5524670 DOI: 10.3389/fpsyg.2017.01238] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Accepted: 07/06/2017] [Indexed: 12/12/2022] Open
Abstract
Psilocybin is a classic (serotonergic) hallucinogen ("psychedelic" drug) that may occasion mystical experiences (characterized by a profound feeling of oneness or unity) during acute effects. Such experiences may have therapeutic value. Research and clinical applications of psychedelics usually include music listening during acute drug effects, based on the expectation that music will provide psychological support during the acute effects of psychedelic drugs, and may even facilitate the occurrence of mystical experiences. However, the features of music chosen to support the different phases of drug effects are not well-specified. As a result, there is currently neither real guidance for the selection of music nor standardization of the music used to support clinical trials with psychedelic drugs across various research groups or therapists. A description of the features of music found to be supportive of mystical experience will allow for the standardization and optimization of the delivery of psychedelic drugs in both research trials and therapeutic contexts. To this end, we conducted an anonymous survey of individuals with extensive experience administering psilocybin or psilocybin-containing mushrooms under research or therapeutic conditions, in order to identify the features of commonly used musical selections that have been found by therapists and research staff to be supportive of mystical experiences within a psilocybin session. Ten respondents yielded 24 unique recommendations of musical stimuli supportive of peak effects with psilocybin, and 24 unique recommendations of musical stimuli supportive of the period leading up to a peak experience. Qualitative analysis (expert rating of musical and music-theoretic features of the recommended stimuli) and quantitative analysis (using signal processing and music-information retrieval methods) of 22 of these stimuli yielded a description of peak period music that was characterized by regular, predictable, formulaic phrase structure and orchestration, a feeling of continuous movement and forward motion that slowly builds over time, and lower perceptual brightness when compared to pre peak music. These results provide a description of music that may be optimally supportive of peak psychedelic experiences. This description can be used to guide the selection and composition of music for future psychedelic research and therapy sessions.
Collapse
|
12
|
Acoustic Features Influence Musical Choices Across Multiple Genres. Front Psychol 2017; 8:931. [PMID: 28725200 PMCID: PMC5495864 DOI: 10.3389/fpsyg.2017.00931] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2017] [Accepted: 05/22/2017] [Indexed: 12/02/2022] Open
Abstract
Based on a large behavioral dataset of music downloads, two analyses investigate whether the acoustic features of listeners' preferred musical genres influence their choice of tracks within non-preferred, secondary musical styles. Analysis 1 identifies feature distributions for pairs of genre-defined subgroups that are distinct. Using correlation analysis, these distributions are used to test the degree of similarity between subgroups' main genres and the other music within their download collections. Analysis 2 explores the issue of main-to-secondary genre influence through the production of 10 feature-influence matrices, one per acoustic feature, in which cell values indicate the percentage change in features for genres and subgroups compared to overall population averages. In total, 10 acoustic features and 10 genre-defined subgroups are explored within the two analyses. Results strongly indicate that the acoustic features of people's main genres influence the tracks they download within non-preferred, secondary musical styles. The nature of this influence and its possible actuating mechanisms are discussed with respect to research on musical preference, personality, and statistical learning.
Collapse
|
13
|
Impaired Maintenance of Interpersonal Synchronization in Musical Improvisations of Patients with Borderline Personality Disorder. Front Psychol 2017; 8:537. [PMID: 28496420 PMCID: PMC5407194 DOI: 10.3389/fpsyg.2017.00537] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2016] [Accepted: 03/23/2017] [Indexed: 11/22/2022] Open
Abstract
Borderline personality disorder (BPD) is a serious and complex mental disorder with a lifetime prevalence of 5.9%, characterized by pervasive difficulties with emotion regulation, impulse control, and instability in interpersonal relationships and self-image. Impairments in interpersonal functioning have always been a prominent characteristic of BPD, indicating a need for research to identify the specific interpersonal processes that are problematic for diagnosed individuals. Previous research has concentrated on self-report questionnaires, unidirectional tests, and experimental paradigms wherein the exchange of social signals between individuals was not the focus. We propose joint musical improvisation as an alternative method to investigate interpersonal processes. Using a novel, carefully planned, ABA′ accompaniment paradigm, and taking into account the possible influences of mood, psychotropic medication, general attachment, and musical sophistication, we recorded piano improvisations of 16 BPD patients and 12 matched healthy controls. We hypothesized that the insecure attachment system associated with BPD would be activated in the joint improvisation and manifest in measures of timing behavior. Results indicated that a logistic regression model, built on differences in timing deviations, predicted diagnosis with 82% success. More specifically, over the course of the improvisation B section (freer improvisation), controls' timing deviations decreased (temporal synchrony became more precise) whereas that of the patients with BPD did not, confirming our hypothesis. These findings are in accordance with previous research, where BPD is characterized by difficulties in attachment relationships such as maintaining strong attachment with others, but it is novel to find empirical evidence of such issues in joint musical improvisation. We suggest further longitudinal research within the field of music therapy, to study how recovery of these timing habits are related to attachment experiences and interpersonal functioning in general.
Collapse
|
14
|
Characterizing Listener Engagement with Popular Songs Using Large-Scale Music Discovery Data. Front Psychol 2017; 8:416. [PMID: 28386241 PMCID: PMC5362644 DOI: 10.3389/fpsyg.2017.00416] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Accepted: 03/06/2017] [Indexed: 11/13/2022] Open
Abstract
Music discovery in everyday situations has been facilitated in recent years by audio content recognition services such as Shazam. The widespread use of such services has produced a wealth of user data, specifying where and when a global audience takes action to learn more about music playing around them. Here, we analyze a large collection of Shazam queries of popular songs to study the relationship between the timing of queries and corresponding musical content. Our results reveal that the distribution of queries varies over the course of a song, and that salient musical events drive an increase in queries during a song. Furthermore, we find that the distribution of queries at the time of a song's release differs from the distribution following a song's peak and subsequent decline in popularity, possibly reflecting an evolution of user intent over the "life cycle" of a song. Finally, we derive insights into the data size needed to achieve consistent query distributions for individual songs. The combined findings of this study suggest that music discovery behavior, and other facets of the human experience of music, can be studied quantitatively using large-scale industrial data.
Collapse
|
15
|
Online Social Networks for Crowdsourced Multimedia-Involved Behavioral Testing: An Empirical Study. Front Psychol 2016; 6:1991. [PMID: 26793137 PMCID: PMC4707286 DOI: 10.3389/fpsyg.2015.01991] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2015] [Accepted: 12/14/2015] [Indexed: 11/26/2022] Open
Abstract
Online social networks have emerged as effective crowdsourcing media to recruit participants in recent days. However, issues regarding how to effectively exploit them have not been adequately addressed yet. In this paper, we investigate the reliability and effectiveness of multimedia-involved behavioral testing via social network-based crowdsourcing, especially focused on Facebook as a medium to recruit participants. We conduct a crowdsourcing-based experiment for a music recommendation problem. It is shown that different advertisement methods yield different degrees of efficiency and there exist significant differences in behavioral patterns across different genders and different age groups. In addition, we perform a comparison of our experiment with other multimedia-involved crowdsourcing experiments built on Amazon Mechanical Turk (MTurk), which suggests that crowdsourcing-based experiments using social networks for recruitment can achieve comparable efficiency. Based on the analysis results, advantages and disadvantages of social network-based crowdsourcing and suggestions for successful experiments are also discussed. We conclude that social networks have the potential to support multimedia-involved behavioral tests to gather in-depth data even for long-term periods.
Collapse
|