Skip to content

Commit

Permalink
Update Mistral notebook to use restricted docker image.
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 666512207
  • Loading branch information
vertex-mg-bot authored and copybara-github committed Aug 23, 2024
1 parent 92f352d commit c019622
Showing 1 changed file with 14 additions and 11 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "iJc36RtD90jd"
},
"outputs": [],
Expand Down Expand Up @@ -95,7 +96,11 @@
"\n",
"# @markdown 2. [Optional] [Create a Cloud Storage bucket](https://cloud.google.com/storage/docs/creating-buckets) for storing experiment outputs. Set the BUCKET_URI for the experiment environment. The specified Cloud Storage bucket (`BUCKET_URI`) should be located in the same region as where the notebook was launched. Note that a multi-region bucket (eg. \"us\") is not considered a match for a single region covered by the multi-region range (eg. \"us-central1\"). If not set, a unique GCS bucket will be created instead.\n",
"\n",
"# @markdown 3. [Make sure that you have GPU quota for Vertex Training (finetuning) and Vertex Prediction (serving)](https://cloud.google.com/docs/quotas/view-manage). The quota name for Vertex Training is \"Custom model training your-gpu-type per region\" and the quota name for Vertex Prediction is \"Custom model serving your-gpu-type per region\" such as `Custom model training Nvidia L4 GPUs per region` and `Custom model serving Nvidia L4 GPUs per region` for L4 GPUs. [Submit a quota increase request](https://cloud.google.com/docs/quotas/view-manage#requesting_higher_quota) if additional quota is needed. At minimum, running this notebook requires 4 A100 80 GB for finetuning and 1 L4 for serving. More GPUs may be needed for larger models and different finetuning configurations. To secure GPUs for larger models, ask your customer engineer to get you allowlisted for a Shared Reservation or a Dynamic Workload Scheduler.\n",
"# @markdown 3. [Make sure that you have GPU quota for Vertex Training (finetuning) and Vertex Prediction (serving)](https://cloud.google.com/docs/quotas/view-manage). The quota name for Vertex Training is \"Custom model training your-gpu-type per region\" and the quota name for Vertex Prediction is \"Custom model serving your-gpu-type per region\" such as `Custom model training Nvidia L4 GPUs per region` and `Custom model serving Nvidia L4 GPUs per region` for L4 GPUs. [Submit a quota increase request](https://cloud.google.com/docs/quotas/view-manage#requesting_higher_quota) if additional quota is needed.\n",
"# @markdown\n",
"# @markdown By default, running this notebook requires 8 A100 80 GB for finetuning and 1 L4 for serving. The finetuning quota is available in `us-central1`.\n",
"# @markdown\n",
"# @markdown More GPUs may be needed for larger models and different finetuning configurations. To secure GPUs for larger models, ask your customer engineer to get you allowlisted for a Shared Reservation or a Dynamic Workload Scheduler.\n",
"\n",
"# Import the necessary packages\n",
"! git clone https://github.com/GoogleCloudPlatform/vertex-ai-samples.git\n",
Expand Down Expand Up @@ -267,12 +272,12 @@
"pretrained_model_id = f\"gs://vertex-model-garden-public-us/{base_model_id}\"\n",
"\n",
"# The pre-built training docker image.\n",
"TRAIN_DOCKER_URI = \"us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-peft-train:20240724_0936_RC00\"\n",
"TRAIN_DOCKER_URI = \"us-docker.pkg.dev/vertex-ai-restricted/vertex-vision-model-garden-dockers/pytorch-peft-train:20240724_0936_RC00\"\n",
"\n",
"# @markdown Batch size for finetuning.\n",
"per_device_train_batch_size = 1 # @param{type:\"integer\"}\n",
"# @markdown Number of updates steps to accumulate the gradients for, before performing a backward/update pass.\n",
"gradient_accumulation_steps = 8 # @param{type:\"integer\"}\n",
"gradient_accumulation_steps = 4 # @param{type:\"integer\"}\n",
"# @markdown Maximum sequence length.\n",
"max_seq_length = 4096 # @param{type:\"integer\"}\n",
"# @markdown Setting a positive `max_steps` here will override `num_epochs`.\n",
Expand Down Expand Up @@ -308,12 +313,9 @@
"# Worker pool spec for 4bit finetuning.\n",
"accelerator_type = \"NVIDIA_A100_80GB\" # @param[\"NVIDIA_A100_80GB\"]\n",
"\n",
"if accelerator_type == \"NVIDIA_L4\":\n",
" accelerator_count = 4\n",
" machine_type = \"g2-standard-48\"\n",
"elif accelerator_type == \"NVIDIA_A100_80GB\":\n",
" accelerator_count = 4\n",
" machine_type = \"a2-ultragpu-4g\"\n",
"if accelerator_type == \"NVIDIA_A100_80GB\":\n",
" accelerator_count = 8\n",
" machine_type = \"a2-ultragpu-8g\"\n",
"else:\n",
" raise ValueError(f\"Unsupported accelerator type: {accelerator_type}\")\n",
"\n",
Expand All @@ -325,6 +327,7 @@
" accelerator_type=accelerator_type,\n",
" accelerator_count=accelerator_count,\n",
" is_for_training=True,\n",
" is_restricted_image=True,\n",
")\n",
"\n",
"# Setup training job.\n",
Expand Down Expand Up @@ -381,7 +384,6 @@
" f\"--template={template}\",\n",
"] + eval_args\n",
"\n",
"\n",
"# Create TensorBoard\n",
"tensorboard = aiplatform.Tensorboard.create(job_name)\n",
"exp = aiplatform.TensorboardExperiment.create(\n",
Expand All @@ -393,10 +395,11 @@
" container_uri=TRAIN_DOCKER_URI,\n",
")\n",
"\n",
"print(\"Running training job with args:\")\n",
"print(\" \\\\\\n\".join(train_job_args))\n",
"# Pass training arguments and launch job.\n",
"train_job.run(\n",
" args=train_job_args,\n",
" environment_variables={\"WANDB_DISABLED\": True},\n",
" replica_count=replica_count,\n",
" machine_type=machine_type,\n",
" accelerator_type=accelerator_type,\n",
Expand Down

0 comments on commit c019622

Please sign in to comment.