Tasks & Goals Instructions Data Results

Data

The datasets for the ZeroSpeech Challenge 2020 are provided here for download. Please note that the archives are protected by a password that is communicated once you accepted the agreement below.

File	Description	Size	MD5 sum
zerospeech2020.z01	Data for the 2020 edition (1/3)	10.0 GB	c9906d9062744cec87f4a4048a0c551b
zerospeech2020.z02	Data for the 2020 edition (2/3)	10.0 GB	7eaa187d403c3aeef94e13f9053ce861
zerospeech2020.zip	Data for the 2020 edition (3/3)	6.0 GB	839a18a0dfe11c706428ddc27d87d5b8
baseline.zip	Baseline submission	6.8 GB	b5934920fcbb0b3af90611185696510b
2017_vads.zip	VAD for the 2017 wavs	1.3 M	c78b21df917b7de4d952d60492327a29

The following script will download and unzip the datasets

#!/bin/bash

PASSWORD=XXXX_REPLACE_WITH_THE_PASSWORD_XXXX
for ext in zip z01 z02
do
    wget https://download.zerospeech.com/2020/zerospeech2020.$ext || exit 1
done
7z x -p$PASSWORD zerospeech2020.zip || exit 1
rm -f zerospeech2020.z* || exit 1
exit 0

Agreement

In order to receive the archive password, please agree to the following terms regarding the surprise language data for the 2019 task:

The data may be used only for the Zero Resource Speech Challenge. Other usages, both research and commercial, are prohibited. The data in the corpus shall not be redistributed. It is permissible, however, to cite examples from the corpus to present research results. All reports/publications using the corpus must acknowledge its use via a citation to the paper describing the source corpus:

S. Sakti, R. Maia, S. Sakai, T. Shimizu, S. Nakamura, “Development of HMM-based Indonesian Speech Synthesis,” in Proc. O-COCOSDA, pp. 215-220, Kyoto, Japan, November 2008
S. Sakti, E. Kelana, H. Riza, S. Sakai, K. Markov, S. Nakamura, “Development of Indonesian Large Vocabulary Continuous Speech Recognition System within A-STAR Project,” in Proc. TCAST, pp. 19-24, Hyderabad, India, January 2008

Please accept the above agreement to download the dataset for the Zero Speech Challenge 2020 .

The password protecting the archive zerospeech2020.zip is sM@pv7bT