Emotion-based Symbolic Music Generation

While there are many symbolic music (MIDI) generators available, very few of them can be conditioned on high-level features such as emotions. Existing emotion-labeled MIDI datasets are limited, containing only a few thousand samples.

In this project, I first created an emotion-based MIDI dataset that is two orders of magnitude larger than the existing ones. The emotion labels are continuous valence-arousal values, enabling fine-grained conditioning. Then, I built multiple architectures for emotion-based symbolic music generation. To the best of my knowledge, this is the only transformer-based generator that can use continuous-valued conditions while processing discrete tokens.

The paper is available on ArXiv (Sulun et al., 2022). The source code is available on Github. Below, I present some output samples.

Supplementary material

Constant conditioning

In the table below, the left, middle, and right columns contain samples generated with negative (unpleasant), neutral and positive (pleasant) valence condition values, respectively. Similarly, the top, middle, and bottom rows contain samples generated with positive (excited), neutral, and negative (calm) arousal condition values, respectively. In each cell, we present samples generated by our three different models, named discrete-token (DT), continuous-token (CT), and continuous-concatenated (CC). Note that all samples are the first random samples that are generated using each configuration, and hence, are not cherry-picked.


Valence
Arousal
Negative Neutral Positive
Positive DT:
CT:
CC:
DT:
CT:
CC:
DT:
CT:
CC:
Neutral DT:
CT:
CC:
DT:
CT:
CC:
DT:
CT:
CC:
Negative DT:
CT:
CC:
DT:
CT:
CC:
DT:
CT:
CC:


Dynamic conditioning

I also present samples that are generated using dynamic conditioning, where the condition values change over time. I used the continuous-token (CC) and continuous-concatenated (CC) models since only they allow dynamic conditioning. Contrary to the samples previously presented, these samples are cherry-picked.

Increasing valence, increasing arousal

CT:    

CC:    

Decreasing valence, decreasing arousal

CT:    

CC:    

Increasing valence, decreasing arousal

CT:    

CC:    

Decreasing valence, increasing arousal

CT:    

CC:    


Cherry-picked samples

Here I present the cherry-picked samples generated for four basic emotions; happy, relaxed, sad and angry.
These emotions occupy the four quadrants of the valence-arousal plane as shown below:

                   Emotions

ANGRY
DT:
CT:
CC:
HAPPY
DT:
CT:
CC:
SAD
DT:
CT:
CC:
RELAXED
DT:
CT:
CC:




Related lightning talk at EPIA 2023:

References

  1. midi_square.png
    Symbolic Music Generation Conditioned on Continuous-Valued Emotions
    Serkan Sulun, Matthew E. P. Davies, and Paula Viana
    IEEE Access, 2022