You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for this inspiring work!
I have a small question, if I may, regarding the noise regularization process.
In models trained using the DDPM-based scheme (such as the one used in pix2pix-zero), the noise regularization process makes sense, as the UNet is optimized to predict Gaussian noise with mean = 0 and var= 1. However, , in distilled DMs, particularly in ADDs, this kind of loss is not part of the training scheme. As a result, the outputs of the time-distilled UNet in models like SD-turbo and SDXL-turbo do not necessarily follow a Gaussian distribution with the defined parameters [0,1]. In fact, the variance of the model's output decreases as t approaches 0.
Here are some example logs (produced using SD-turbo with 4 inference-steps) to illustrate this:
timesteps 999.0: noise_pred mean = 0.0030574470292776823 | noise_pred var = 0.978159487247467
25%|██▌ | 1/4 [00:01<00:03, 1.10s/it]
timesteps 749.0: noise_pred mean = 0.0017684325575828552 | noise_pred var = 0.9806435704231262
50%|████▌ | 2/4 [00:01<00:00, 1.98it/s]
timesteps 499.0: noise_pred mean = -0.0025877265725284815 | noise_pred var = 0.877240777015686
75%|███████▌ | 3/4 [00:01<00:00, 2.91it/s]
timesteps 249.0: noise_pred mean = -0.0006634604651480913 | noise_pred var = 0.7107224464416504
100%|██████████| 4/4 [00:01<00:00, 2.71it/s]
Considering this, do you think noise regularization is still relevant in ADDs?
Thank you in advance for your time!
Or
The text was updated successfully, but these errors were encountered:
Hi,
Thank you for this inspiring work!
I have a small question, if I may, regarding the noise regularization process.
In models trained using the DDPM-based scheme (such as the one used in pix2pix-zero), the noise regularization process makes sense, as the UNet is optimized to predict Gaussian noise with
mean = 0
andvar= 1
. However, , in distilled DMs, particularly in ADDs, this kind of loss is not part of the training scheme. As a result, the outputs of the time-distilled UNet in models like SD-turbo and SDXL-turbo do not necessarily follow a Gaussian distribution with the defined parameters[0,1]
. In fact, the variance of the model's output decreases as t approaches 0.Here are some example logs (produced using SD-turbo with 4 inference-steps) to illustrate this:
Considering this, do you think noise regularization is still relevant in ADDs?
Thank you in advance for your time!
Or
The text was updated successfully, but these errors were encountered: