Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand functionality to different word embedding files #10

Open
dafnevk opened this issue Oct 22, 2021 · 2 comments
Open

Expand functionality to different word embedding files #10

dafnevk opened this issue Oct 22, 2021 · 2 comments

Comments

@dafnevk
Copy link

dafnevk commented Oct 22, 2021

Although there is a read.wordvectors function that can read in a plan text file with vectors, the predict.word2vec function only works on 'model' objects, that can not be created from these word vector files.

Would it be possible to have the predict.word2vec function work on only the embedding matrix? This way, it would be possible to use it for all types of word vector models, e.g. trained with fasttext.

@jwijffels
Copy link
Contributor

jwijffels commented Oct 22, 2021

predict.word2vec is exactly the same as function word2vec_similarity, which you can apply on 2 embedding matrices or vectors.

  • That will work on embeddings trained with this package as training is optimised for that similarity
  • but this might not be what you want if you have embeddings trained in another framework.

That being said apply word2vec_similarity and see if it works for your embeddings

@jwijffels
Copy link
Contributor

jwijffels commented Oct 22, 2021

Note that if you need embedding models with subwords, you might as well use sentencepiece_download_model from the sentencepiece R package. This downloads sentencepiece tokenizers alongside the embedding model trained on wikipedia. Compatible with this R package

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants