Create multimodal conversational experiences with Google Cloud Dialogflow CX and Gemini Vision

Latest Update: 19/04/2024

Intro

This is the companion repo of the Medium blog Create multimodal conversational experiences with Google Cloud Dialogflow CX and Gemini Vision. If you haven't read the article, my first advise is to start from there.

This repo contains a working prototype of the agent described in the article, mainly for learning purposes. Feel free to check-out the repo and re-use according to the licensing terms.

Repo Structure

This repo contains three main artifacts:

An intent-based Dialogflow CX agent;
A Google Cloud Function, invoked by the agent via a webhook;
A webapp (in my case I've deployed it on Google App Engine), embedding a Dialogflow Messenger widget, to interact with the agent deployed on the back-end.

Architecture

The aforementioned artifacts implement the following architecture.

Setup

This is the procedure to setup the whole prototype. Feel free to change some of the configurations according to your needs.

Google Cloud Functions

Disclaimer: this code has been validated with Gemini 1.0 Pro Vision.

You need to insert your project ID and selected region within main.py:

project_id = 'your-project-id'
region = 'your-region'

After editing the file, zip both the cloud function files (main.py and requirements.txt) as a ZIP file (more info on the archive structure here).

The settings for your cloud function should be:

Function Name: driving-license-webhook
Environment: 1st Gen
Region: your desired region
Runtime: Python 3.12
Timeout: 60 seconds (this is very important, since Gemini takes some time to return)
Entry Point: validate_driving_license
Allow all traffic (being a demo, I've relaxed my security requirements)
Source: the ZIP file you've just created

Feel free to deploy the cloud function using the Google Cloud Console or gcloud command. More info here.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
cloudfunction		cloudfunction
df-messenger-app		df-messenger-app
dfcx-bot		dfcx-bot
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cloudfunction

cloudfunction

df-messenger-app

df-messenger-app

dfcx-bot

dfcx-bot

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Create multimodal conversational experiences with Google Cloud Dialogflow CX and Gemini Vision

Intro

Repo Structure

Architecture

Setup

Google Cloud Functions

Dialogflow Messenger (via Google App Engine)

Dialogflow CX Agent

About

Releases

Packages

Languages

License

grandelli/dfcx-geminiprovision

Folders and files

Latest commit

History

Repository files navigation

Create multimodal conversational experiences with Google Cloud Dialogflow CX and Gemini Vision

Intro

Repo Structure

Architecture

Setup

Google Cloud Functions

Dialogflow Messenger (via Google App Engine)

Dialogflow CX Agent

About

Topics

Resources

License

Stars

Watchers

Forks

Languages