Automatic-Image-Captioning

A neural network architecture(CNN+LSTM) that automatically generates captions from the images. The model uses ResNet architecture to train the Encoder while DecoderRNN has to be trained with our choice of trainable parameters. I have trained the model on the Microsoft Common Objects in COntext (MS COCO) dataset and have tested the network on fictitious images!

Summary

Dataset used is the COCO data set by Microsoft.
Feature vectors for images are generated using a CNN based on the ResNet architecture by Google.
Word embeddings are generated from captions for training images. NLTK was used for working with processing of captions.
Implemented an RNN decoder using LSTM cells.
Trained the network for more than 6 hrs for 3 epochs using GPU to achieve average loss of about 2%.
Obtained outputs for some test images to understand efficiency of the trained network.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
images		images
0_Dataset.ipynb		0_Dataset.ipynb
1_Preliminaries.ipynb		1_Preliminaries.ipynb
2_Training.ipynb		2_Training.ipynb
3_Inference.ipynb		3_Inference.ipynb
LICENSE		LICENSE
README.md		README.md
data_loader.py		data_loader.py
model.py		model.py
vocabulary.py		vocabulary.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic-Image-Captioning

Summary

About

Languages

License

ksheersaagr/Automatic-Image-Captioning

Folders and files

Latest commit

History

Repository files navigation

Automatic-Image-Captioning

Summary

About

Topics

Resources

License

Stars

Watchers

Forks

Languages