Enterprise AI Analysis

A Meta-Analysis of Music Emotion Recognition Studies

Authors: TUOMAS EEROLA, CAMERON J. ANDERSON

Journal: ACM Computing Surveys

Executive Impact: Key Takeaways

This meta-analysis by Eerola and Anderson comprehensively reviews music emotion recognition (MER) models published between 2014 and 2024. Analyzing 34 studies and 290 models, it focuses on predictions of valence, arousal, and categorical emotions. Key findings include moderate accuracy for valence (r=0.67) and higher accuracy for arousal (r=0.81) in regression tasks, while classification models achieved 0.87 MCC. The study highlights that linear and tree-based methods often outperform neural networks in regression, whereas NNs and SVMs excel in classification. The authors emphasize critical recommendations for future MER research, advocating for greater transparency, feature validation, and standardized reporting to enhance comparability and reliability. This work provides a crucial benchmark for the MER field, offering insights into model performance, feature set impact, and the need for more diverse datasets and rigorous reporting practices to advance the state of the art.

0 Total Studies Reviewed

0 Total Models Analyzed

0 Valence Prediction Accuracy (r)

0 Arousal Prediction Accuracy (r)

0 Classification Accuracy (MCC)

Discuss Your Implementation

Deep Analysis & Enterprise Applications

Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.

Music's capacity to convey emotions has been a long-standing area of interest, predating modern AI applications. Early efforts focused on generative composition and later shifted to predicting emotion from structural cues. The introduction of Audio Mood Classification (AMC) in the Music Information Retrieval EXchange (MIREX) in 2007 spurred significant research. Initial accuracy for mood classification was 52.65%, rising to 69.83% by the tenth annual AMC task. Regression models, introduced later, showed valence prediction at 28.1% and arousal at 58.3% in early studies, with arousal consistently proving easier to predict. A wide array of models, from linear regression to deep neural networks, have been employed.

MER studies utilize diverse datasets, incorporating features from music (audio, MIDI, metadata) and participants (demographic, survey, physiological signals). Publicly shared datasets like DEAM and AMG1608 primarily feature Western pop music, ranging from 744 to 1,802 excerpts, manually annotated by various participants. Audio software suites like OpenSMILE and MIR Toolbox are used for feature extraction. The distinction between predictive and explanatory modeling frameworks is crucial: large datasets support complex predictive models, while smaller, curated datasets are valuable for explaining musical factors influencing emotion. While visual domain datasets are much larger (e.g., EmoSet with 118,102 images), MER datasets are resource-intensive to annotate directly, leading to efforts in inferring emotions from tags (e.g., MTG-Jamendo, Music4all). Small datasets remain useful for testing new features or as reference standards.

Predictive accuracy in MER has improved significantly over the past decade. Regression models for arousal/valence have seen peaks at 58%/28% (2008), 70%/26% (2010), and 67%/46% (2021). Classification rates increased from 53% to 70% and then 83%. Despite these improvements, comparing studies is challenging due to inconsistencies in metrics, models, and evaluation criteria. The overall accuracy has improved, but valence remains harder to predict than arousal. Recent advancements involve identifying more relevant feature sets, integrating multimodal data, and leveraging neural networks to learn features directly from audio. Other approaches address how emotions are represented across different contexts and languages. The "semantic gap" is often better understood as inherent measurement error from annotations, feature representations, and model limitations.

The meta-analysis highlights a critical need for improvements in MER research practices. Future reports should provide comprehensive information on data, models, and success metrics, including CV descriptions, feature extraction techniques, and actual accuracy measures. The use of Matthews Correlation Coefficient (MCC) for classification and R² for regression is recommended for standardization. Transparency in inter-rater agreement and dataset annotation quality is crucial. Sharing models (code/notebooks) and features is highly encouraged, ideally through platforms like Zenodo or Kaggle, to facilitate direct comparisons and reproducibility. Detailed reporting of stimuli, genres, duration, sampling rates, encoding formats, features (types, extraction software, quantity, transformations, reduction methods), and model details (types, tuning, CV) is essential. A key aspect is the choice of features: domain-knowledge driven, limited features (often in music psychology) versus large feature sets used by machine learning. The study suggests the domain-knowledge approach currently leads to higher model accuracy.

0.809 Average Arousal Prediction (r) in MER Studies

Study Selection & Review Process

Records Identified (n=553)

→

Records Screened (n=553)

→

Full Text Articles Assessed (n=96)

→

Studies Included in Initial Review (n=46)

→

Final Selection (n=34)

Performance Comparison by Model Type and Task

Model Type	Valence (r)	Arousal (r)	Classification (MCC)
Linear Methods (LMs)	0.784	0.882	0.728
Tree-based Methods (TMs)	0.750	0.809	0.853
Support Vector Machines (SVMs)	0.539	0.796	0.870
Neural Networks (NNs)	0.473	0.660	0.931

LMs and TMs generally outperform NNs and SVMs in regression tasks (valence, arousal), while NNs and SVMs show superior performance in classification tasks.

Impact of Data & Reporting Quality on MER

The meta-analysis revealed significant variability in the quality control and reporting practices within MER studies. Many studies were excluded due to insufficient information on data, model architectures, or outcome measures. This lack of transparency impedes direct comparison and reproducibility of results. Standardized reporting and open sharing of datasets and code are crucial for advancing the field. 'For example, smaller datasets often come from music psychology studies, which put a premium on data quality (quality control of the ground-truth data and extracted features) rather than on dataset size and model techniques.' The findings suggest that focusing on domain-knowledge driven feature selection with smaller, high-quality datasets can yield comparable or even superior results to complex models trained on larger, less curated datasets for certain tasks. This highlights the importance of balancing data quantity with data quality and methodological rigor.

Calculate Your Potential ROI with MER

Optimizing Music Emotion Recognition can yield significant returns by improving content recommendations, personalized user experiences, and efficient data processing.

Industry Sector

Number of Employees Dedicated to Content/Data Analysis

Average Weekly Hours Spent on Manual Labeling/Review

Average Hourly Rate for This Work ($)

Projected Annual Savings $0

Reclaimed Annual Hours 0

Unlock Your Full Potential

Your Enterprise AI Roadmap

A typical MER implementation involves several key phases, tailored to your specific enterprise needs. Our approach focuses on iterative development and continuous improvement.

Phase 1: Discovery & Strategy

Understand current MER workflows, identify key emotional dimensions, and define success metrics. Data audit and initial model selection.

Phase 2: Data Curation & Feature Engineering

Develop high-quality, annotated datasets. Extract relevant musical features and experiment with feature reduction techniques.

Phase 3: Model Development & Training

Build and train computational models (LMs, NNs, SVMs, TMs). Implement robust cross-validation and hyperparameter tuning.

Phase 4: Validation & Integration

Evaluate model performance against benchmarks using MCC/R². Integrate MER system into existing platforms for real-time application.

Phase 5: Monitoring & Iteration

Continuously monitor model accuracy and retrain with new data. Incorporate user feedback for ongoing improvements.

Book a Consultation

Ready to Transform Your Music-Related Data Strategy?

Unlock the full potential of AI in understanding and applying music emotions. Our expert team is ready to help you implement a robust and scalable MER solution that drives real business value.

Schedule Your Strategy Session

Enterprise AI Analysis

A Meta-Analysis of Music Emotion Recognition Studies

Executive Impact: Key Takeaways

Deep Analysis & Enterprise Applications

Study Selection & Review Process

Performance Comparison by Model Type and Task

Impact of Data & Reporting Quality on MER

Calculate Your Potential ROI with MER

Your Enterprise AI Roadmap

Phase 1: Discovery & Strategy

Phase 2: Data Curation & Feature Engineering

Phase 3: Model Development & Training

Phase 4: Validation & Integration

Phase 5: Monitoring & Iteration

Ready to Transform Your Music-Related Data Strategy?

Ready to Get Started?

Book Your Free Consultation.

Let's Discuss Your AI Strategy!

Lets Discuss Your Needs

Select Time Zone

Big Competitive Advantage With Ai

Learn More

Our Demos

Research Center

Contact Us

1 888 985 3025

Solutions@OwnYourAi.com

Get Your Ai