Generating feature encodings using deep CNNs

Base code description

We use this code to generate CNN features on Flickr8k images using Imagenet pre-trained CNNs from VGG. Since Flickr8k images do not have class labels, it is not possible to directly fine-tune a network on them. We therefore use a Gaussian Mixture Model to adapt to the Flickr8k images in an unsupervised manner, and then use Fisher Vector encoding to get features that are, hopefully, more adapted to the Flickr8k dataset than the original Imagenet pre-trained network features. This is under the assumption that there is a sufficient shift in domain between Imagenet and Flickr.

This code simply extracts the features at various settings (details below). The training of LSTMs to perform image captioning on the FV-CNN and regular CNN features is the next step (not in this repo at present).

The repository contains code using VLFEAT and MATCONVNET to:

Train CNNs models from scratch or fine-tune on datasets
Extract a vareity of CNNs features including:
- R-CNN : features from CNNs at various layers
- D-CNN : CNN filterbanks with Fisher Vector pooling
- B-CNN : bilinear CNN
Run experiments on variety of datasets.

Getting started with the code

Compiling vlfeat

cd vlfeat
make MEX=/exp/comm/matlab-R2014b/bin/mex

Compiling matconvnet

Checkout a stable release of the matconvnet. On my linux machines I found the v1.0-beta9 release to be stable. You can do this by:

git fetch --tags
git checkout tags/v1.0-beta9

Edit the Makefile to reflect the paths of CUDA and MATLAB. For example on my linux machine I set the following.

ARCH ?= glnxa64
MATLABROOT ?= /exp/comm/matlab 
CUDAROOT ?= /usr/local/cuda-7.0

I have a NVIDIA K40 GPU so I compiled matconvnet with GPU support. You may have to install libjpeg-dev to enable fast JPEG read support. On an ubuntu machines this is easily done by

sudo apt-get install libjpeg-dev

I compiled the code using the following flags:

make ENABLE_GPU=y ENABLE_IMREADJPEG=y

Acknowledgements

The base code is taken from the Bilinear CNN project by Tsung-Yu Lin, Aruni RoyChowdhury and Subhransu Maji at UMass Amherst.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
compute_confusion.m		compute_confusion.m
cub_get_database.m		cub_get_database.m
encoder_save.m		encoder_save.m
facescrub_get_database.m		facescrub_get_database.m
flickr8k_get_database.m		flickr8k_get_database.m
get_bcnn_features.m		get_bcnn_features.m
get_dcnn_features.m		get_dcnn_features.m
get_rcnn_features.m		get_rcnn_features.m
imdb_cnn_train.m		imdb_cnn_train.m
imdb_cnn_train_binary.m		imdb_cnn_train_binary.m
imdb_get_batch.m		imdb_get_batch.m
model_setup.m		model_setup.m
model_train.m		model_train.m
montage_datasets.m		montage_datasets.m
print_dataset_info.m		print_dataset_info.m
radar_get_database.m		radar_get_database.m
run_experiments.m		run_experiments.m
run_experiments_train.m		run_experiments_train.m
saveNetwork.m		saveNetwork.m
savefast.m		savefast.m
setup.m		setup.m
visualize_filter_patch.m		visualize_filter_patch.m
vl_bilinearnn.m		vl_bilinearnn.m
vl_l2norm.m		vl_l2norm.m
vl_nnbilinear.m		vl_nnbilinear.m
vl_nnbilinearpool.m		vl_nnbilinearpool.m
vl_nnsqrt.m		vl_nnsqrt.m

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generating feature encodings using deep CNNs

Base code description

Getting started with the code

Compiling vlfeat

Compiling matconvnet

Acknowledgements

About

Releases

Packages

Languages

Melody86/fv-cnn-caption

Folders and files

Latest commit

History

Repository files navigation

Generating feature encodings using deep CNNs

Base code description

Getting started with the code

Compiling vlfeat

Compiling matconvnet

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages