This project aims to bring the Terzani Photo collection online enabling text search and image search in the photo collection.
This project uses a NoSQL database provided by MongoDb. Thus it is advised the users to create a cloud Database and provide its URI as described below.
After the python script uploads the data to the database, to perform text base search the tag collection requires a search index. A search index can be created on the collection using at steps shown here. After step 6, replace the JSON document with the below document and continue with the remaining steps.
{
"mappings": {
"dynamic": false,
"fields": {
"tag": [
{
"foldDiacritics": false,
"maxGrams": 7,
"minGrams": 3,
"tokenization": "edgeGram",
"type": "autocomplete"
},
{
"analyzer": "lucene.standard",
"multi": {
"keywordAnalyzer": {
"analyzer": "lucene.keyword",
"type": "string"
}
},
"type": "string"
}
]
}
}
}
To use the google vision api to retrive annotaions, the user has to provide the API credentials as GOOGLE_APPLICATION_CREDENTIALS
in a .env
file.
To use the online monogo db service to retrive data or upload data, the user has to provide the API credentials as MONGO_URI
in a .env
file.
The following project requires python>=3.8
Use the package manager pip to install foobar.
pip install -r requirements.txt
Setup the project
python setup.py install
The photo annotation on a collection can be performed using the create_database.py
script in the scripts/dataprocessing
directory. A Json configuration file should be provided to run the script.
{
"data_folder": "<folder to store data>",
"scrap_image_iiif": "<Boolean to indicate if the image information needs to be scraped from a IIIF server>",
"collection_url": "<URL of the collection>",
"unsupported_collections": "<List of (string) collection ids to be neglected>",
"col_cntry_json": "<Path to a Json file providing country mapping to each collection>",
"server": "<The URL of the IIIF server containing the collection>",
"manifest": "<Path to the manifest Json in the server>",
"annotate_iiif": "<Boolean to indicate if the image needs to be annotated using Vison API>",
"update_db": "<Boolean to indicate if the database needs to be updated>",
"db_name": "<Name of the Mongo Database>",
"tag_collection_name": "<Name of the Image tag collection in Mongo Database>",
"annotation_collection_name": "<Name of the Image annotation collection in Mongo Database>",
"fvector_collection_name": "<Name of the Image feature vector collection in Mongo Database>",
"nu_photos": "<Number of images to be processed. If left empty or is set as 'full' all images are processed. If a number is provided, those many images are randomly sampled from the collection>"
}
python scripts/dataprocessing/create_database.py -c "./config.json"
The following components in the server
present at website/server.py needs to be changed according to the usage.
- sample_annotations = mongo.db["Name of the Mongo collection containg the annotations"]
- sample_tags = mongo.db["Name of the Mongo collection containg the taggings"]
- sample_imageVectors = mongo.db["Name of the Mongo collection containg the feature vectors"]
python website/server.py
In case of error on the website, the should restart the server.
We have observed that the python script sometimes fails to download the pretrained model for colorization. In such cases, the users are required to download the pth file from here and place it in website/DeOldify/models/
with the name ColorizeStable_gen.pth
.