Data

The datasets for the ZeroSpeech Challenge 2017 are provided here for download.

Development dataset

Training datasets (Tracks 1 & 2)

  • English dataset: english.zip (3.3G, md5sum: a3142800ae19c19b511c3cf36c0c38c3)

  • French dataset: french.zip (2.2G, md5sum: 63ca0abc3c4c664aa6f7210305449f67)

  • Mandarin dataset: mandarin.zip (228M, md5sum: fd1d952f8a6e9db82b9eb296acc4e11a)

Test datasets (Track 1)

  • English dataset: english_test.zip (2.9G, md5sum: 980aa3163b453b2baae5ffb2ff552bcc)

  • French dataset: french_test.zip (2.1G, md5sum: 74acf72cb278085e3432b74f2a3bb0c3)

  • Mandarin dataset: mandarin_test.zip (2.8G, md5sum: 6dc50bebdec961a5dd8b857ebeb0982e)

Surprise dataset

Training datasets (Tracks 1 & 2)

  • LANG1 dataset: LANG1.zip (2.1G, md5sum: be6dbfcd4083fd2572aafa8be78ec677)

  • LANG2 dataset: LANG2.zip (387Mo, md5sum: 28997056e9d2eeede09cd1a272a93fd3)

Test datasets (Track 1)

  • LANG1 dataset: LANG1_test.zip (1.1G, md5sum: 034c415d3bb4914f87bddaebbcb151a7)

  • LANG2 dataset: LANG2_test.zip (543Mo, md5sum: bde2ea1cf9196a3dc91f21439fd7a93f)

Dataset VADS

  • The VAD for the development and surprise datasets: 2017_vads.zip (1.3 Mo, md5sum: c78b21df917b7de4d952d60492327a29)