You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Variational AutoEncoder - Keras implementation on mnist and cifar10 datasets
Dependencies
keras
tensorflow / theano (current implementation is according to tensorflow. It can be used with theano with few changes in code)
numpy, matplotlib, scipy
implementation Details
code is highly inspired from keras examples of vae : ,
(source files contains some code duplication)
MNIST
images are flatten out to treat them as 1D vectors
encoder and decoder - both have normal neural network architecture
network architecture
it trains vae model according to the hyperparameters defined in
entire vae model, encoder and decoder is stored as keras models in directory as ld_<latent_dim>_id_<intermediate_dim>_e_<epochs>_<vae/encoder/decoder>.h5 where <latent_dim> is number of latent dimensions, <intermediate_dim> is number of neurons in hidden layer and <epochs> is number of training epochs
after training, the saved model can be used to analyse the latent distribution and to generate new images
it also stores the training history in ld_<latent_dim>_id_<intermediate_dim>_e_<epochs>_history.pkl
it is only for 2 dimensional latent space
it loads trained model according to the hyperparameters defined in mnist_params.py
it displays the latent space distribution and then generates the images according to user input of latent variables (see the code as it is almost self-explanatory)
it can also generate images from latent vectors randomly sampled from 2D latent space (comment out the user input lines) and display them in a grid
it is same as mnist_2d_latent_space_and_generate.py but it is for 3d latent space
it loads trained model according to the hyperparameters defined in mnist_params.py
if latent space is either 2D or 3D, it displays it
it displays a grid of images generated from randomly sampled latent vectors
results
2D latent space
latent space
uniform sampling
3D latent space
3D latent space results
uniform sampling
random sampling
more results are in directory
CIFAR10
images are treated as 2D input
encoder has the architecture of convolutional neural network and decoder has the architecture of deconvolutional network
network architecture for encoder and decoder are as follows
encoder
decoder
,
implementation structure is same as mnist files
result - latent dimensions 16
25 epochs
50 epochs
75 epochs
600 epochs
CALTECH101
caltech101_<sz>_train.py and caltech101_<sz>_generate.py (where sz is the size of input image - here the training was done for two sizes - 92*92 and 128*128) are same as cifar10 dataset files
as the image size is large, more computation power is needed to train the model
results obtained with less training are qualitatively not good
in dataset directory, is provided to preprocess the dataset