Skip to content

Latest commit

 

History

History

serving-flant5

Flan-T5

Intro

This repo containerizes flan-t5 into a serving container using fastapi.

Since this uses Huggingface's transformers library, specifically the AutoModeForSeq2SeqLM class, it should be able to load other models supported by the library.

The model license can be found here.

Features:

  • Text generation.
  • Language translation.
  • Sentiment analysis.
  • text classification.

Setup

  1. Clone repo if you haven't. Navigate to the serving-flant5 folder.

  2. Build container. Don't forget to change the project_id to yours.

    docker build --build-arg model_name=google/flan-t5-large . -t gcr.io/{project_id}/serving-t5:latest
  3. Run container. You need NVIDIA docker and a GPU.

    docker run -p 80:8080 --gpus all -e AIP_HEALTH_ROUTE=/health -e AIP_HTTP_PORT=8080 -e AIP_PREDICT_ROUTE=/predict gcr.io/{project_id}/serving-t5:latest -d
  4. Make predictions

    python test_container.py

Deploy in Vertex AI

You'll need to enable Vertex AI and have authenticated with a service account that has the Vertex AI admin or editor role.

  1. Push the image

    gcloud auth configure-docker
    docker push gcr.io/{project_id}/serving-t5:latest
  2. Deploy in Vertex AI Endpoints

    python ../gcp_deploy.py --image-uri gcr.io/<project_id>/serving-t5:latest --machine-type n1-standard-8 --model-name flant5 --endpoint-name flant5-endpoint --endpoint-deployed-name flant5-deployed-name
    
  3. Test the endpoint

    python generate_request_vertex.py