Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added docs for raw deployment autoscaling. #312

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

andyi2it
Copy link
Contributor

@andyi2it andyi2it commented Nov 6, 2023

"Fixes #303" Update Autoscaling docs for Raw deployment mode

Proposed Changes

Copy link

netlify bot commented Nov 6, 2023

Deploy Preview for elastic-nobel-0aef7a ready!

Name Link
🔨 Latest commit 8135ecd
🔍 Latest deploy log https://app.netlify.com/sites/elastic-nobel-0aef7a/deploys/6548ac23ad6ec4000887d949
😎 Deploy Preview https://deploy-preview-312--elastic-nobel-0aef7a.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@kserve-oss-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: andyi2it
To complete the pull request process, please assign theofpa after the PR has been reviewed.
You can assign the PR to them by writing /assign @theofpa in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Comment on lines +515 to +519
serving.kserve.io/deploymentMode: RawDeployment
serving.kserve.io/autoscalerClass: hpa
serving.kserve.io/metric: cpu
serving.kserve.io/targetUtilizationPercentage: "80"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are the annotations for the old schema

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also document the possible supported metric type for RawDeployment mode

### HPA in Raw Deployment

When using Kserve with the `RawDeployment` mode, Knative is not installed. In this mode, if you deploy an `InferenceService`, Kserve uses **Kubernetes’ Horizontal Pod Autoscaler (HPA)** for autoscaling instead of **Knative Pod Autoscaler (KPA)**. For more information about Kserve's autoscaler, you can refer [`this`](https://kserve.github.io/website/master/modelserving/v1beta1/torchserve/#knative-autoscaler)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to refer to the official Knative autoscaler doc.

The default for scaleMetric is `concurrency` and possible values are `concurrency`, `rps`, `cpu` and `memory`.

## Autoscaler for Kserve's Raw Deployment Mode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth separate page for this, this doc is a bit too long.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update Autoscaling docs for KServe Raw Deployment Mode
3 participants