Skip to content

Latest commit

 

History

History
23 lines (14 loc) · 1.73 KB

README.md

File metadata and controls

23 lines (14 loc) · 1.73 KB

tAPAS

Assisted Parameter optimization by Approximating neighbourhood Similarity

Bayesian optimization of hyperparameters for t-SNE in the context of word embeddings - i. e.: Input is a word embedding (labels + coordinates in high-dimensional space). The optimization procedure samples the parameter space to generate low-dimensional approximations of the original word embeddig data using t-SNE. The quality/truthfulness of the resulting models are evaluated with several metrics:

  • Trustworthiness: Measure for proportion of points too close together1 in the low-dimensional space.
  • Continuity: Measure for proportion of points too far apart1 in the low-dimensional space.
  • Generalization: Generalization error of 1-nearest neighbour classifier (e. g. word embedding is clustered in high-dimensional and low-dimensional space - the higher the similarity between the cluster labels, the lower the generalization error).
  • Relative word embedding quality: QVEC [1] is used to evalute the intrinsic quality of the original word embedding and its dimensionality-reduced projection. The ratio is referred to as 'relative word embedding quality'.

The first three measures were chosen following [2].

Main View

Generation of New Runs

1: In terms of neighbourhood ranks, not absolute distances.


[1] Y. Tsvetkov, M. Faruqui, W. Ling, G. Lample, and C. Dyer, “Evaluation of Word Vector Representations by Subspace Alignment,” 2015, pp. 2049–2054.

[2] L. J. P. van der Maaten, E. O. Postma, and H. J. van den Herik, Dimensionality Reduction: A Comparative Review. 2008.