The dataset, the baseline and random submissions are provided here for download. The dataset is released under a Creative Commons 4.0 licence. The baseline checkpoints include CPC checkpoints firrst released in the public domain by Faceboook AI Research.

File Description Size MD5 sum Data for the 2021 edition 24 GB d196d4c9174f1bf2ce7111a19abddaca Purely random submission provided as example 0.2 GB e58b62602f34fddc97a39a3ebf2b21ab Baseline submission (BERT) 13 GB 8544fe3fccb6ead94a6ae1e260240ca8 Baseline submission (LSTM) 17 GB 994d1323b43376e7f03e6cd06e966e60
baseline_checkpoints.tar.gz Baseline checkpoints 2.4 GB 3c5cfeda5dca079f2c0c02b6cbeb08ed

The following commands will download and unzip the dataset:

unzip -d zerospeech2021_dataset
rm -f