Skip to content

Commit

Permalink
Added docs for raw deployment autoscaling.
Browse files Browse the repository at this point in the history
Signed-off-by: Andrews Arokiam <[email protected]>
  • Loading branch information
andyi2it committed Oct 26, 2023
1 parent 551763b commit 480f4d3
Showing 1 changed file with 89 additions and 2 deletions.
91 changes: 89 additions & 2 deletions docs/modelserving/autoscaling/autoscaling.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Autoscale InferenceService with inference workload

## InferenceService with target concurrency
## Autoscaler for kserve's Serverless

### InferenceService with target concurrency

### Create `InferenceService`

Expand Down Expand Up @@ -492,4 +494,89 @@ This allows more flexibility in terms of the autoscaling configuration. In a typ
- mnist
```
Apply the `autoscale-adv.yaml` to create the Autoscale InferenceService.
The default for scaleMetric is `concurrency` and possible values are `concurrency`, `rps`, `cpu` and `memory`.
The default for scaleMetric is `concurrency` and possible values are `concurrency`, `rps`, `cpu` and `memory`.

## Autoscaler for Kserve's Raw Deployment Mode

KServe supports `RawDeployment` mode to enable `InferenceService` deployment with Kubernetes resources [`Deployment`](https://kubernetes.io/docs/concepts/workloads/controllers/deployment), [`Service`](https://kubernetes.io/docs/concepts/services-networking/service), [`Ingress`](https://kubernetes.io/docs/concepts/services-networking/ingress) and [`Horizontal Pod Autoscaler`](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale). Comparing to serverless deployment it unlocks Knative limitations such as mounting multiple volumes, on the other hand `Scale down and from Zero` is not supported in `RawDeployment` mode.

### HPA in Raw Deployment

When using Kserve with the `RawDeployment` mode, Knative is not installed. In this mode, if you deploy an `InferenceService`, Kserve uses **Kubernetes’ Horizontal Pod Autoscaler (HPA)** for autoscaling instead of **Knative Pod Autoscaler (KPA)**. For more information about Kserve's autoscaler, you can refer [`this`](https://kserve.github.io/website/master/modelserving/v1beta1/torchserve/#knative-autoscaler)


=== "Old Schema"

```yaml
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "sklearn-iris-hpa"
annotations:
serving.kserve.io/deploymentMode: RawDeployment
serving.kserve.io/autoscalerClass: hpa
serving.kserve.io/metric: cpu
serving.kserve.io/targetUtilizationPercentage: "80"
spec:
predictor:
sklearn:
storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
```

=== "New Schema"

```yaml
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "sklearn-iris-hpa"
annotations:
serving.kserve.io/deploymentMode: RawDeployment
serving.kserve.io/autoscalerClass: hpa
serving.kserve.io/metric: cpu
serving.kserve.io/targetUtilizationPercentage: "80"
spec:
predictor:
model:
modelFormat:
name: sklearn
storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
```

### Disable HPA in Raw Deployment

If you want to control the scaling of the deployment created by KServe inference service with an external tool like [`KEDA`](https://keda.sh/). You can disable KServe's creation of the **HPA** by replacing **external** value with autoscaler class annotaion that should be disable the creation of HPA

=== "Old Schema"

```yaml
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
annotations:
serving.kserve.io/deploymentMode: RawDeployment
serving.kserve.io/autoscalerClass: external
name: "sklearn-iris"
spec:
predictor:
sklearn:
storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
```

=== "New Schema"

```yaml
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
annotations:
serving.kserve.io/deploymentMode: RawDeployment
serving.kserve.io/autoscalerClass: external
name: "sklearn-iris"
spec:
predictor:
model:
modelFormat:
name: sklearn
storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
```

0 comments on commit 480f4d3

Please sign in to comment.