Variational AutoEncoder - Keras implementation on mnist and cifar10 datasets
- keras
- tensorflow / theano (current implementation is according to tensorflow. It can be used with theano with few changes in code)
- numpy, matplotlib, scipy
code is highly inspired from keras examples of vae : ,
(source files contains some code duplication)
- images are flatten out to treat them as 1D vectors
- encoder and decoder - both have normal neural network architecture
network architecture |
---|
- it trains vae model according to the hyperparameters defined in
- entire vae model, encoder and decoder is stored as keras models in directory as
ld_<latent_dim>_id_<intermediate_dim>_e_<epochs>_<vae/encoder/decoder>.h5
where<latent_dim>
is number of latent dimensions,<intermediate_dim>
is number of neurons in hidden layer and<epochs>
is number of training epochs - after training, the saved model can be used to analyse the latent distribution and to generate new images
- it also stores the training history in
ld_<latent_dim>_id_<intermediate_dim>_e_<epochs>_history.pkl
- it is only for 2 dimensional latent space
- it loads trained model according to the hyperparameters defined in
mnist_params.py
- it displays the latent space distribution and then generates the images according to user input of latent variables (see the code as it is almost self-explanatory)
- it can also generate images from latent vectors randomly sampled from 2D latent space (comment out the user input lines) and display them in a grid
- it is same as
mnist_2d_latent_space_and_generate.py
but it is for 3d latent space
- it loads trained model according to the hyperparameters defined in
mnist_params.py
- if latent space is either 2D or 3D, it displays it
- it displays a grid of images generated from randomly sampled latent vectors
latent space | uniform sampling |
---|---|
uniform sampling | random sampling |
---|---|
- images are treated as 2D input
- encoder has the architecture of convolutional neural network and decoder has the architecture of deconvolutional network
- network architecture for encoder and decoder are as follows
encoder |
---|
decoder |
---|
implementation structure is same as mnist files
25 epochs | 50 epochs | 75 epochs |
---|---|---|
600 epochs |
---|
caltech101_<sz>_train.py
andcaltech101_<sz>_generate.py
(wheresz
is the size of input image - here the training was done for two sizes - 92*92 and 128*128) are same as cifar10 dataset files- as the image size is large, more computation power is needed to train the model
- results obtained with less training are qualitatively not good
- in
dataset
directory, is provided to preprocess the dataset