Two forward passes in finetune_unet #139

gunshi · 2023-10-08T09:54:41Z

Hey, thanks for open-sourcing this code!
I had a quick question about the finetune_unet function in train.py: why are there two forward passes and loss computations through the unet?
Is it to implement some sort of self-conditioning which I've read about in some text-to-image diffusion papers (or could you point to the part of the paper that corresponds to)?
Thanks!

The text was updated successfully, but these errors were encountered:

ExponentialML · 2023-10-14T00:26:05Z

Hi, and thanks! Just to clarify, I'm not the original author of the code.

The two forward passes are for to the text to image training part if the user chooses to do so. In my tests, training the text encoder alongside video data (temporal information) does not bode well, whereas sampling a single frame works much, much better.

An alternative is to concatenate the frames along the frames on the batch dimension into the text encoder, but that would require a significant memory footprint.

Hope that clears it up!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Two forward passes in finetune_unet #139

Two forward passes in finetune_unet #139

gunshi commented Oct 8, 2023 •

edited

ExponentialML commented Oct 14, 2023

Two forward passes in finetune_unet #139

Two forward passes in finetune_unet #139

Comments

gunshi commented Oct 8, 2023 • edited

ExponentialML commented Oct 14, 2023

gunshi commented Oct 8, 2023 •

edited