Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data augmentation for SVHN dataset #71

Open
QiushiYang opened this issue Oct 13, 2021 · 4 comments
Open

Data augmentation for SVHN dataset #71

QiushiYang opened this issue Oct 13, 2021 · 4 comments

Comments

@QiushiYang
Copy link

QiushiYang commented Oct 13, 2021

Many thanks for your clean codes.

(1) It seems there is no code of dataloader for SVHN dataset. Would you like to introduce what kinds of data pre-processing and augmenation methods (and mean&std values used for normalization) you used for SVHN dataset in the training phase?

(2) Also, as for the results reported in Table 2 of your paper, I wonder which version of SVHN dataset (it has two versions, one including 73257 training images and 26032 test images, the other one named extra version containing 631131 images) you used to train FixMatch models?

@carlini
Copy link
Collaborator

carlini commented Oct 13, 2021

The SVHN dataloader can be found here:

fixmatch/libml/data.py

Lines 297 to 300 in d4985a1

d.update([DataSets.creator('svhn', seed, label, valid, augment_fn)
for seed, label, valid in itertools.product(range(6), [10 * x for x in SAMPLES_PER_CLASS], [1, 5000])])
d.update([DataSets.creator('svhn_noextra', seed, label, valid, augment_fn)
for seed, label, valid in itertools.product(range(6), [10 * x for x in SAMPLES_PER_CLASS], [1, 5000])])

We run experiments with the smaller svhn_noextra dataset here.

@QiushiYang
Copy link
Author

Thanks a lot!
Do you mean that you used smaller version of SVHN dataset to train and to test, without the svhn_extra?

@carlini
Copy link
Collaborator

carlini commented Oct 18, 2021

Yeah, we used the smaller dataset to train. This dataset, in the code, is called svhn_noextra. If you look at the ReMixMatch paper I think we also have results for how much gain you et by moving to the full svhn dataset, and it was rather small.

@QiushiYang
Copy link
Author

Many thanks! I will have a try :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants