Skip to content
This repository has been archived by the owner on Dec 14, 2023. It is now read-only.

Two forward passes in finetune_unet #139

Open
gunshi opened this issue Oct 8, 2023 · 1 comment
Open

Two forward passes in finetune_unet #139

gunshi opened this issue Oct 8, 2023 · 1 comment

Comments

@gunshi
Copy link

gunshi commented Oct 8, 2023

Hey, thanks for open-sourcing this code!
I had a quick question about the finetune_unet function in train.py: why are there two forward passes and loss computations through the unet?
Is it to implement some sort of self-conditioning which I've read about in some text-to-image diffusion papers (or could you point to the part of the paper that corresponds to)?
Thanks!

@ExponentialML
Copy link
Owner

Hi, and thanks! Just to clarify, I'm not the original author of the code.

The two forward passes are for to the text to image training part if the user chooses to do so. In my tests, training the text encoder alongside video data (temporal information) does not bode well, whereas sampling a single frame works much, much better.

An alternative is to concatenate the frames along the frames on the batch dimension into the text encoder, but that would require a significant memory footprint.

Hope that clears it up!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants