Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to directly use word embedding from the pre-trained LM during training and inference? #7

Open
Fishersponge opened this issue Jul 15, 2020 · 4 comments

Comments

@Fishersponge
Copy link

If I have a 'train.mdb', how can I use the fasttext pre-train model cc.en.300.bin? I see nothing about fasttext in your ./models and trainers.py and main.py. Waiting for your answer, thanks~

@Pay20Y
Copy link
Owner

Pay20Y commented Jul 26, 2020

Hi, please refer to create_all_synth_lmdb.py and modify the dataloader accordingly.

@Ma01180724
Copy link

hello,i have a same question,we have the datasets nips2014 and cvpr 2016 (lmdb),how can i use the fasttext pre-train model?can you solve it? thanks

@Ma01180724
Copy link

@Pay20Y ,could you give me datasets that you had prepared?

@Pay20Y
Copy link
Owner

Pay20Y commented Aug 2, 2020

@Ma01180724 Hi, I'm really sorry that I can't directly share the training datasets with you because of the large storage. There are two ways to get the training datasets by yourself. First, You can modify the create_all_synth_lmdb.py that load the labels from MJ and ST then generates the new LMDB datasets with embedding labels. Second, as mentioned before, you can modify the dataloader and generate the according word embedding from the recognition label during the training process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants