Adds video GAN framework #29

daniel-j-h · 2019-10-26T13:47:36Z

Playground for video GANs. Right now this adds

video attention mechanisms: self-attention, simple attention, global context block
learned upsampling: conv+pixelshuffle+icnr
biggan's resisual up/down building blocks
tensorboard integration: tracking losses and writing out generated video examples

References

attention https://arxiv.org/abs/1904.11492
pixelshuffle and icnr https://arxiv.org/abs/1707.02937
sagan https://arxiv.org/abs/1805.08318
bigan https://arxiv.org/abs/1809.11096

TensorBoard logs can be viewed via

docker run -it --rm --network=host -v /tmp:/data tensorflow/tensorflow:2.0.0-py3 \
  tensorboard --bind_all --logdir /data/tbevents

Work in progress 🤗

daniel-j-h · 2019-10-26T18:20:11Z

The attention mechanisms are explained beautifully in https://arxiv.org/abs/1904.11492

from https://arxiv.org/abs/1904.11492

BigGAN and SAGAN are using the expensive Non-Local block for attention. This changset implements the Non-Local block (SelfAttention), the paper's simplified Non-Local block (SimpleSelfAttention) and the paper's proposed Global Context Block (GlobalContext) not for 2d but for 3d video models.

daniel-j-h · 2019-10-26T18:25:42Z

The learned upsampling via conv+pixelshuffle+icnr are explained in https://arxiv.org/abs/1707.02937

from https://arxiv.org/abs/1707.02937

This changeset implements the same idea for 3d: we can call it "voxel shuffle" 🤗 Idea is to treat the block as sub-voxel convolution and initialize the conv volumes with ICNR appropriately. The figure above shows the situation in 2d; here we work with voxels instead of pixels, but the core idea is the same.

BigGAN by default uses simple nearest neighbor upsampling followed by conv layers (because it's cheap when scaling their models up to huge sizes I suppose?); one experiment will be to swap these learned upsampling layers in and see if they are any good in 3d.

daniel-j-h · 2019-10-26T18:38:24Z

The overall BigGAN / SAGAN'ish architecture is explained in https://arxiv.org/abs/1809.11096

from https://arxiv.org/abs/1809.11096

For now we do not condition our GANs: embeddings, concatenations, and batchnorm modifications are out. The idea to split the noise z into chunks and feed them in at various layers (not just at the bottom of the generator) is interesting and something we can experiment with.

As notes above: one experiment is to replace the simple nearest neighbor upsampling with our voxel shuffle block and see how that changes things.

Note: when implementing the discriminator blocks: the residual branch starts with a relu: in PyTorch this must not be an inplace operation otherwise it will change the original input tensor, too!

from https://arxiv.org/abs/1809.11096

This describes the overall architecture for 128x128. Instead of BigGAN's self-attention which is quite expensive we can use our 3d global context blocks which are cheaper, and then add more of them to our generator and discriminator.

We should start simple e.g. with (TxHxW) 8x32x32 clips. One problem will be the difference in up/down-sampling rates: we need to decouple the time domain from the spatial domain. One experiment is to see where and how to up/down-sample time.

Here will be 🐉 This will be fun! 🤗

daniel-j-h · 2019-10-28T19:50:23Z

First somewhat reasonable results are in after a day of training (see below); obervations

Losses are spike'y but otherwise training seems to be stable! Maybe it's because of the hinge losses; need to experiment here.
My rig is currently bottlenecked by CPUs decoding the h264 videos and pre-processing frames; with a batch size of 432 clips (one clip has 32 frames right now) we need to produce 13824 frames per batch. Might want to look into nvidia's GPU based h264 decoding and/or re-think the video dataloader we have right now. We also might want to start with 8-frame clips for a start and adapt the up/down-sampling in time-domain.

daniel-j-h requested a review from sandhawalia October 26, 2019 13:47

daniel-j-h force-pushed the vgan branch from 76d3dcc to ee270bb Compare October 26, 2019 18:14

Adds video GAN framework

6ac0b2b

daniel-j-h force-pushed the vgan branch from ee270bb to 6ac0b2b Compare October 27, 2019 22:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds video GAN framework #29

Adds video GAN framework #29

daniel-j-h commented Oct 26, 2019

daniel-j-h commented Oct 26, 2019

daniel-j-h commented Oct 26, 2019

daniel-j-h commented Oct 26, 2019

daniel-j-h commented Oct 28, 2019

Adds video GAN framework #29

Are you sure you want to change the base?

Adds video GAN framework #29

Conversation

daniel-j-h commented Oct 26, 2019

daniel-j-h commented Oct 26, 2019

daniel-j-h commented Oct 26, 2019

daniel-j-h commented Oct 26, 2019

daniel-j-h commented Oct 28, 2019