Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question regarding datasets #31

Open
pianoman4873 opened this issue Mar 5, 2017 · 2 comments
Open

question regarding datasets #31

pianoman4873 opened this issue Mar 5, 2017 · 2 comments

Comments

@pianoman4873
Copy link

pianoman4873 commented Mar 5, 2017

Hello,
This is not an issue but rather a question -
Where could I get all the datasets you reported to in the paper ?
Do you think that training on ALL datasets together would improve the results ?
What about training for various languages - do you think a model containing text for mixed languages would behave better or worse than models handling each language separately ?

And another question regarding phrases - the google's word2vec pretrained vectors include also phrases - were they taken into account as well ?

@yoonkim
Copy link
Owner

yoonkim commented Mar 8, 2017

Hi, you can obtain all the datasets here:

https://github.com/harvardnlp/sent-conv-torch

Phrases were not taken into account from word2vec.

@pianoman4873
Copy link
Author

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants