Paper | Website | Hugging Face Dataset
Run the following from the root of the repository:
conda env create -f environment.yamlWill create a conda environment named
syntheory
The environment may be activated with:
conda activate syntheoryIf there are problems installing from environment.yaml, the main dependencies we need are
pytorch: (see link): LINKtransformers(pip install transformers): LINKjukemirlib(pip install git+https://github.com/rodrigo-castellon/jukemirlib.git): LINKffmpeg-python(pip install ffmpeg-python): LINKmido(pip install mido): LINKzarr(pip install zarr): LINK
You may need to install the following dependencies, if you don't already have them.
Requires ffmpeg.
On Linux, through a package manager like apt:
apt install ffmpegRequires ffmpeg.
Can be installed through homebrew (on MacOS) with:
brew install ffmpeg(If running on M1 mac, must also install libsndfile):
brew install libsndfileNote, even after installing libsndfile, OSError: sndfile library not found might be raised. To fix this, run:
conda install -c conda-forge libsndfileFirst, install cargo. Then, run
bash repo/compile_synth.shWe need this to turn MIDI into a .wav file. A bit more information on this step can be found here: README.md
The following datasets exist in the dataset/synthetic folder:
chords: ~18.7 GB (13,248 samples)chord_progressions: ~29.61 GB (20,976 samples)intervals: ~56.1 GB (39,744 samples)notes: ~14.02 GB (9,936 samples)scales: ~21.82 GB (15,456 samples)tempos: ~5.68 GB (4,025 samples)time_signatures: ~1.48 GB (1,200 samples)
Any of these can be generated by running the below command from the root of the repository:
python dataset/synthetic/<NAME_OF_DATASET>.pywhere <NAME_OF_DATASET> is one of the following datasets: chords, chord_progressions, intervals, notes, scales, tempos, and time_signatures.
This will create a folder in: data/<NAME_OF_DATASET>.
Each folder will contain:
info.csv.wavfiles.midfiles
Some are quite large.
You can also download our dataset through Hugging Face here.
To download a particular concept (e.g. notes), run the following script:
from datasets import load_dataset
notes = load_dataset("meganwei/syntheory", "notes")You can also access our dataset in streaming mode instead of downloading the entire dataset to disk by running the following (for each desired concept):
from datasets import load_dataset
notes = load_dataset("meganwei/syntheory", "notes", streaming=True)
print(next(iter(notes)))We have a short guide that explains how to use this codebase to create custom datasets. We encourage the community to create more complex and diverse concept definitions.
Note
We hope for SynTheory to be more than a static dataset - it is a framework and procedure for creating music theoretic concept understanding datasets.
Custom Dataset Instruction Guide
After creating a synthetic dataset, run
python embeddings/extract_embeddings.py --config embeddings/emb.yamlThis will use a specific configuration to extract embeddings for each .wav file and save them to a zarr file. You can specify the models and concepts to extract embeddings from by editing models and concepts in the configuration file.
For each dataset and model combination, this will produce a csv file named data/<NAME_OF_DATASET>/<NAME_OF_DATASET>_<MODEL_HASH>_embeddings_info.csv. Each file contains information about each embedding for that particular model and dataset.
Due to the time it may take to extract these embeddings, the dataset is partitioned into shards, each responsible for extracting up to some constant number of embeddings. This will start a SLURM job for each shard.
To launch probing experiments, run
python probe/run_probes.pyYou can specify which models and concepts you're probing by modifying the sweep_config argument. The sweep configurations are defined in probe/probe_config.py. Currently, we include configs on probing the handcrafted features, MusicGen Audio Encoder, Jukebox, and MusicGen decoder language models across all concepts.
In our implementation, we performed hyperparameter search for handcrafted features (CHROMA, MELSPEC, MFCC, HANDCRAFT) and MusicGen Audio Encoder. For Jukebox and MusicGen Decoder models, we used a fixed set of hyperparameters and employ a layer selection process. You're welcome to adapt the models, concepts, and hyperparameters towards your own needs by modifying SWEEP_CONFIGS in probe/probe_config.py.
Our implementation involves logging our probing results to a Weights & Biases project. Before you run the script, make sure to log into your wandb account by providing your wandb API key and running wandb login.
When analyzing your results on Weights & Biases, the desired probing metric is under primary_eval_metric.
From the root of the repository, one may run tests with:
bash repo/test.shThis will call pytest and pytest-cov to produce a coverage report.
If you find this useful, please cite us in your work.
@inproceedings{Wei2024-music,
title={Do Music Generation Models Encode Music Theory?},
author={Wei, Megan and Freeman, Michael and Donahue, Chris and Sun, Chen},
booktitle={International Society for Music Information Retrieval},
year={2024}
}