Bert pre training approach #1376

shu1273 · 2022-12-23T14:22:04Z

Hi,

I have stated working on Bert model. Do anyone know what was Bert pre-training accuracy(not fine tuned) using 100-0-0 masking approach vs 80-10-10 approach. I could not get it anywhere.
Basically I understand why 80-10-10 approach is implemented but did they do any experiments to figure this out

shu1273 closed this as completed Dec 23, 2022

shu1273 reopened this Dec 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bert pre training approach #1376

Bert pre training approach #1376

shu1273 commented Dec 23, 2022

Bert pre training approach #1376

Bert pre training approach #1376

Comments

shu1273 commented Dec 23, 2022