-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug] Triton BLS scripting fails with model
not ready
#3299
Comments
I notice that a load model api has been added to From what I've read an ensemble automatically loads models it uses (since Triton can infer that from the config) but not necessarily when you dynamically invoke another model. So I suspect that for whatever reason the 'add_sub' model hasn't been loaded at the point when I invoke Unfortunately, it looks like I may have to wait until 23.09 is released by NVidia and then added to Sagemaker if I want to explicitly load the dependent models from the BLS (i.e. in the initialize method). |
I found a solution based on https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-triton/business_logic_scripting/stable_diffusion/model_repository/pipeline/1/model.py There's a note in the accompanying notebook that confirms my hypothesis above, you need to explicitly load the model being invoked using BLS This is a bit of a workaround - I would suggest explicitly adding |
Right, to load the BLS model explicitly, along with other models that it refers to, they'd have to be loaded at once using the SM Triton v23.06 onwards enable the |
Thanks, migration to v23.06+ is something I intend to do but haven't had time to create and test a py310 conda-pack environment which has held me back. |
@nskool I'm having issues getting this to work with v23.06 i.e. using the follow to define the model works fine with 23.02
But when I tried 23.06 with the setup below it only loaded the pipeline model - the log doesn't even mention the encoder at all (this model uses the
If I start triton locally using the |
Also when I use the same
It's only when I move the |
@david-waterworth I will look into this, perhaps some ordering issue with the additional_args vs. log_info workaround. Please continue using the workaround. I will update the thread once I have more info. |
@nskool any updates on the thread above? |
So the above hack with |
@FrikadelleHelle it's the only way I've got it to work! I've update triton containers a couple of times and this doesn't appear to have been addressed (@nskool can you please confirm). |
Thanks for commenting on this stale issue @david-waterworth ! For me it seems I can deploy one BLS version in MME mode but when I call other models it seems sagemaker has forgotten the |
@FrikadelleHelle no I didn't manage to get it to work, I suspect because it doesn't fit the general sagemaker interface. The other thing that's frustrated me is triton has a compressed binary request/response format but I'm using sagemaker async endpoints so as far as I can tell there's no way to use this properly as you don't have access to the triton server request/response headers (because when you call the sagemaker invoke async doesn't let you pass additional headers to be included in the triton request). I feel that to get the best out of triton you probably need to host it yourself (i.e. on an EC2 instance) |
Would that be solved if you were to set up a custom reverse proxy? (see instructions for bring your own container on sagemaker, where they show how to set up a custom nginx.config) In theory, that would declaw sagemaker's way of interacting with Triton running on a sagemaker endpoint and simply pass through your request as is to be handled by your custom reverse proxy. |
@jadhosn the issue I have is specific to batch and async inference. In both these configurations the actual model endpoint (i.e. triton) is invoked by AWS so you cannot pass headers, i.e. when you call |
Checklist
Concise Description:
I'm using Triton BLS scripting to script another model. My problem occurs when I use the official NVidia Triton examples, specifically Sync BLS Model
This model consists of "bls_sync", "add_sub" and "pytorch". The default model is "bls_sync" (all three use the python backend). When "bls_sync" is invoked it forwards the request to one of the other two models using the following code:
This works fine using the NVidia Triton image (v23.02) but I cannot get it to work on Sagemaker - the last line above is causing the issue.
DLC image/dockerfile:
I'm using
355873309152.dkr.ecr.ap-southeast-2.amazonaws.com/sagemaker-tritonserver:23.02-py3
Current behavior:
I've tried two alternatives, the first is to define the default model i.e.
And invoke
This results in the
infer_request.exec()
frombls_model
toadd_sub
failingThe alternative I'v tried is to set "MultiModel" and pass TargetModel="bls_sync" but this also errors:
Neither seem to behave in the same manner as the NVidia Triton container, Triton ensemble models work fine, but BLS scripting doesn't, not does it appear possible to invoke different models from a triton model-repository deployed to Sagemaker (the MultiModel option appears to be downloading a repository, not invoking a model in the existing repository?)
Expected behavior:
An inference request between models in the same Triton container should work in the same manner as it does using the official NVidia Triton image.
Additional context:
The text was updated successfully, but these errors were encountered: