Latest Update: 19/04/2024
This is the companion repo of the Medium blog Create multimodal conversational experiences with Google Cloud Dialogflow CX and Gemini Vision. If you haven't read the article, my first advise is to start from there.
This repo contains a working prototype of the agent described in the article, mainly for learning purposes. Feel free to check-out the repo and re-use according to the licensing terms.
This repo contains three main artifacts:
- An intent-based Dialogflow CX agent;
- A Google Cloud Function, invoked by the agent via a webhook;
- A webapp (in my case I've deployed it on Google App Engine), embedding a Dialogflow Messenger widget, to interact with the agent deployed on the back-end.
The aforementioned artifacts implement the following architecture.
This is the procedure to setup the whole prototype. Feel free to change some of the configurations according to your needs.
Disclaimer: this code has been validated with Gemini 1.0 Pro Vision.
You need to insert your project ID and selected region within main.py
:
project_id = 'your-project-id'
region = 'your-region'
After editing the file, zip both the cloud function files (main.py
and requirements.txt
) as a ZIP file (more info on the archive structure here).
The settings for your cloud function should be:
- Function Name: driving-license-webhook
- Environment: 1st Gen
- Region: your desired region
- Runtime: Python 3.12
- Timeout: 60 seconds (this is very important, since Gemini takes some time to return)
- Entry Point: validate_driving_license
- Allow all traffic (being a demo, I've relaxed my security requirements)
- Source: the ZIP file you've just created
Feel free to deploy the cloud function using the Google Cloud Console or gcloud
command. More info here.