Symbolic Music Generation Based on Continous Emotions

My paper titled “Symbolic Music Generation Based on Continous Emotions” is published on IEEE Access. The supplementary material is presented at the bottom of this page.

The paper is available on IEEE Xplore and on ArXiv. The source code is available on Github. Our paper can be cited as:

@article{9762257,
  title={Symbolic music generation conditioned on continuous-valued emotions}, 
  author={Sulun, Serkan and Davies, Matthew E. P. and Viana, Paula},
  journal={IEEE Access}, 
  year={2022},
  volume={10},
  pages={44617-44626},
  doi={10.1109/ACCESS.2022.3169744}}

The complete set of samples are available through the Google Drive link. If you’d like to avoid using your Google account, please use an incognito browser window. You can listen to a small set of samples below.

The samples have at most 5 instruments, namely, drums, bass guitar, electric guitar, piano, and strings. The midi files are rendered into wav format using the Fluidsynth software and FluidR3_GM soundfont.


Constant conditioning

In the table below, the left, middle, and right columns contain samples generated with negative (unpleasant), neutral and positive (pleasant) valence condition values, respectively. Similarly, the top, middle, and bottom rows contain samples generated with positive (excited), neutral, and negative (calm) arousal condition values, respectively. In each cell, we present samples generated by our three different models, named discrete-token (DT), continuous-token (CT), and continuous-concatenated (CC). Note that all samples are the first random samples that are generated using each configuration, and hence, are not cherry-picked.


Valence
Arousal
Negative Neutral Positive
Positive DT:
CT:
CC:
DT:
CT:
CC:
DT:
CT:
CC:
Neutral DT:
CT:
CC:
DT:
CT:
CC:
DT:
CT:
CC:
Negative DT:
CT:
CC:
DT:
CT:
CC:
DT:
CT:
CC:


Dynamic conditioning

I also present samples that are generated using dynamic conditioning, where the condition values change over time. I used the continuous-token (CC) and continuous-concatenated (CC) models since only they allow dynamic conditioning. Contrary to the samples previously presented, these samples are cherry-picked.

Increasing valence, increasing arousal

CT:    

CC:    

Decreasing valence, decreasing arousal

CT:    

CC:    

Increasing valence, decreasing arousal

CT:    

CC:    

Decreasing valence, increasing arousal

CT:    

CC:    


Cherry-picked samples

Here I present the cherry-picked samples generated for four basic emotions; happy, relaxed, sad and angry.
These emotions occupy the four quadrants of the valence-arousal plane as shown below:

                   Emotions

ANGRY
DT:
CT:
CC:
HAPPY
DT:
CT:
CC:
SAD
DT:
CT:
CC:
RELAXED
DT:
CT:
CC:




Related lightning talk at EPIA 2023:

Written on January 27, 2022