Unable to reproduce the report's distribution metrics using SUPPORT #20

JoeLill100 · 2022-07-05T09:58:34Z

Hello,

I am trying to reproduce the distribution metrics established using SUPPORT, as stated on page 13 of the SynthVAE report.

I have downloaded available code and checked that my libraries are identical to those given in the requirements.txt file. I am using Python version 3.8.0.

I have ran the following code (for both pre-processing methods) on windows in command prompt:

python scratch_vae_expts.py --pre_proc_method GMM

and:

python scratch_vae_expts.py --pre_proc_method Standard

I wasn't clear on which pre-processing method was used in the report. However, in both cases regardless, the distribution metrics that I have computed for the VAE model are different to those stated in the pdf. Please can you help suggest how to fix this? I have not modified the available code in any way. Perhaps the issue is due to seeding?

Thank you in advance.

matthewcooper19 · 2022-08-02T15:08:07Z

Hi Joe, as part of our work using SynthVAE in the synthetic data pipeline we found similar issues in terms of reproducability. We found that any metrics that use sklearn components are not reproducible and cannot be made so without changing the sdv code.

The reason for this is that setting the numpy random seed doesn't have the scope to set the sklearn random_state when it's imported from another file. As a result any metrics that use a sklearn component with a random_state argument will not be reproducible.

Metrics such as GMLogLikelihood, detection metrics e.g. logistic regression, standard vector machine all use sklearn so will be affected by this.

Although no fix is available at the moment, I wanted to add the above to give more info around the likely cause of this.

JoeLill100 added the bug Something isn't working label Jul 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to reproduce the report's distribution metrics using SUPPORT #20

Unable to reproduce the report's distribution metrics using SUPPORT #20

JoeLill100 commented Jul 5, 2022

matthewcooper19 commented Aug 2, 2022 •

edited

Loading

Unable to reproduce the report's distribution metrics using SUPPORT #20

Unable to reproduce the report's distribution metrics using SUPPORT #20

Comments

JoeLill100 commented Jul 5, 2022

matthewcooper19 commented Aug 2, 2022 • edited Loading

matthewcooper19 commented Aug 2, 2022 •

edited

Loading