Results

The columns are sortable by clicking on the |sortable| picture of each column header. A detailed view of the results is available by clicking on the details picture of each row.

The columns are interpreted as follows (see Evaluation metrics for details):

  • Phonetic (across and within)

    • ABX error rate on embeddings
    • Scale is $[0, 1]$, lower is better
  • Lexical and Syntactic

    • Mean correct / incorrect classification accurary
    • Scale is $[0, 1]$, higher is better
    • For Lexical the all column is the mean accuracy over five frequency bins (based on raw frequency counts in LibriSpeech-960: OOV; 1-5; 6-20; 21-100; 101+), and the in vocab. column leaves out the OOV category. Only the all column was published in the Interspeech summary paper.
  • Semantic

    • Human judgement correlation coeficient (x 100$)
    • Scale is $[-100, 100]$, far from 0 is better
    • Mean score across all datasets
    • Semantic (Weighted): Same as Semantic with mean score weighted by the number of pairs in each dataset. Only the unweighted (Semantic) columns were published in the Interspeech summary paper.
Phonetic (Within) Phonetic (Across) Lexical Syntactic Semantic Semantic (Weighted)
# Author Budget Set clean other clean other all in vocab. synth. libri. synth. libri.