Data

The dataset, the baseline and random submissions are provided here for download. The dataset is released under a Creative Commons 4.0 licence. The baseline checkpoints include CPC checkpoints first released in the public domain by Faceboook AI Research.

File Description Size MD5 sum
zerospeech2021-dataset.zip Data for the 2021 edition 24 GB d196d4c9174f1bf2ce7111a19abddaca
zerospeech2021-submission-random.zip Purely random submission provided as example 0.2 GB e58b62602f34fddc97a39a3ebf2b21ab
zerospeech2021-submission-baseline-bert.zip Baseline submission (BERT) 13 GB 8544fe3fccb6ead94a6ae1e260240ca8
zerospeech2021-submission-baseline-lstm.zip Baseline submission (LSTM) 17 GB 994d1323b43376e7f03e6cd06e966e60
topline_checkpoints.tar.gz Topline checkpoints 1.6 GB 5dd0c31d37bd07a4ec52ac44ede0017f
baseline_checkpoints.tar.gz Baseline checkpoints 2.4 GB 3c5cfeda5dca079f2c0c02b6cbeb08ed
baseline_checkpoints_VG.zip Visually Grounded Baseline checkpoints 1.9 GB cd15403948f3d91ef1a3a58931418d26

The following commands will download and unzip the dataset:

wget https://download.zerospeech.com/2021/zerospeech2021-dataset.zip
unzip zerospeech2021-dataset.zip -d zerospeech2021_dataset
rm -f zerospeech2021-dataset.zip