Data

The dataset, the baseline and random submissions are provided here for download. The dataset is released under a Creative Commons 4.0 licence. The baseline checkpoints include CPC checkpoints firrst released in the public domain by Faceboook AI Research.

File Description Size MD5 sum
zerospeech2021-dataset.zip Data for the 2021 edition 24 GB d196d4c9174f1bf2ce7111a19abddaca
zerospeech2021-submission-random.zip Purely random submission provided as example 0.2 GB e58b62602f34fddc97a39a3ebf2b21ab
zerospeech2021-submission-baseline-bert.zip Baseline submission (BERT) 13 GB 8544fe3fccb6ead94a6ae1e260240ca8
zerospeech2021-submission-baseline-lstm.zip Baseline submission (LSTM) 17 GB 994d1323b43376e7f03e6cd06e966e60
baseline_checkpoints.tar.gz Baseline checkpoints 2.4 GB 3c5cfeda5dca079f2c0c02b6cbeb08ed

The following commands will download and unzip the dataset:

wget https://download.zerospeech.com/2021/zerospeech2021-dataset.zip
unzip zerospeech2021-dataset.zip -d zerospeech2021_dataset
rm -f zerospeech2021-dataset.zip