Classifier Free Guidance (CFG), a value AI artists tinker with every day.
Classifier guidance is a way to incorporate** image labels** in diffusion models. You can use a label to guide the diffusion process. For example, the label "cat" steers the reverse diffusion process to generate photos of cats.
The classifier guidance scale is a parameter for controlling how closely the diffusion process should follow the label.
Here is an example below. Suppose there are 3 groups of images with the label "cat", "dog" and "human". If the diffusion is unguided, the model will draw samples from each group's total population, but sometimes it may draw images that could fit two labels, e.g. a boy petting a dog.
Classifier guidance. Left: unguided. Middle: small guidance scale. Right: large guidance scale.
With high classifier guidance, the images produced by the diffusion model would be biased toward the extreme or unambiguous examples. If you ask the model for a cat, it will return an image that is unambiguously a cat and nothing else.
The classifier guidance scale controls how closely the guidance is followed. In the figure above, the sampling on the right has a higher classifier guidance scale than the one in the middle. In practice, this scale value is simply the multiplier to the drift term toward the data with that label.
Classifier guidance needs an extra model to provide that guidance, but this has presented some difficulties in training.
Classifier-free guidance, in its authors' terms, is a way to achieve "classifier guidance without a classifier". Instead of using class labels and a separate model for guidance, they proposed to use image captions and train a conditional diffusion model, exactly like the conditioning in text-to-image.
They put the classifier part as conditioning of the noise predictor U-Net, achieving the so-called "classifier-free" (i.e. without a separate image classifier) guidance in image generation.
The text prompt provides this guidance in text-to-image.
How to control how much the guidance should be followed with classifier-free diffusion process via conditioning?
Classifier-free guidance (CFG) scale is a value that controls how much the text prompt conditions the diffusion process. The image generation is unconditioned (i.e. the prompt is ignored) when it is set to 0. A higher value steers the diffusion towards the prompt.