Emotion-based Symbolic Music Generation
While there are many symbolic music (MIDI) generators available, very few of them can be conditioned on high-level features such as emotions. Existing emotion-labeled MIDI datasets are limited, containing only a few thousand samples.
In this project, I first created an emotion-based MIDI dataset that is two orders of magnitude larger than the existing ones. The emotion labels are continuous valence-arousal values, enabling fine-grained conditioning. Then, I built multiple architectures for emotion-based symbolic music generation. To the best of my knowledge, this is the only transformer-based generator that can use continuous-valued conditions while processing discrete tokens.
The paper is available on ArXiv (Sulun et al., 2022). The source code is available on Github. Below, I present some output samples.
Supplementary material
Constant conditioning
In the table below, the left, middle, and right columns contain samples generated with negative (unpleasant), neutral and positive (pleasant) valence condition values, respectively. Similarly, the top, middle, and bottom rows contain samples generated with positive (excited), neutral, and negative (calm) arousal condition values, respectively. In each cell, we present samples generated by our three different models, named discrete-token (DT), continuous-token (CT), and continuous-concatenated (CC). Note that all samples are the first random samples that are generated using each configuration, and hence, are not cherry-picked.
Valence Arousal | Negative | Neutral | Positive |
---|---|---|---|
Positive | DT: CT: CC: | DT: CT: CC: | DT: CT: CC: |
Neutral | DT: CT: CC: | DT: CT: CC: | DT: CT: CC: | Negative | DT: CT: CC: | DT: CT: CC: | DT: CT: CC: |
Dynamic conditioning
I also present samples that are generated using dynamic conditioning, where the condition values change over time. I used the continuous-token (CC) and continuous-concatenated (CC) models since only they allow dynamic conditioning. Contrary to the samples previously presented, these samples are cherry-picked.
Increasing valence, increasing arousal
CT:
CC:
Decreasing valence, decreasing arousal
CT:
CC:
Increasing valence, decreasing arousal
CT:
CC:
Decreasing valence, increasing arousal
CT:
CC:
Cherry-picked samples
Here I present the cherry-picked samples generated for four basic emotions; happy, relaxed, sad and angry.
These emotions occupy the four quadrants of the valence-arousal plane as shown below:
ANGRY DT: CT: CC: | HAPPY DT: CT: CC: |
---|---|
SAD DT: CT: CC: | RELAXED DT: CT: CC: |
Related lightning talk at EPIA 2023: