Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IAM cred caching for OIDC flow #3712

Merged

Conversation

Manouchehri
Copy link
Collaborator

@Manouchehri Manouchehri commented May 17, 2024

Title

This should improve the perf, as this allows avoiding making an extra call to STS on every invocation of Bedrock.

Should be merged after #3688. Doing this as a separate PR in case you don't like my caching method; I don't want this PR to slow down #3688 from being merged.

Type

🆕 New Feature

Changes

Just uses DualCache to speed things up!

[REQUIRED] Testing - Attach a screenshot of any new tests passing locall

If UI changes, send a screenshot/GIF of working UI fixes

I ran the same request multiple times, and could see the cache being used in LiteLLM's logs. (It's kinda difficult to post a screenshot without including a lot of PII.)

image

Copy link

vercel bot commented May 17, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
litellm ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 11, 2024 5:02pm

Copy link
Contributor

@krrishdholakia krrishdholakia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3858 blocks this pr

since the refactoring to make the call async will require a refactor here

@krrishdholakia
Copy link
Contributor

@Manouchehri please can we add some testing for this? couple users use the sts flow and i want to make sure there's no regressions

@Manouchehri
Copy link
Collaborator Author

@Manouchehri please can we add some testing for this? couple users use the sts flow and i want to make sure there's no regressions

Definitely, that's a good idea, I'll do. We sorta ish already test it, but it's more by luck and not intentional.

Two questions:

  1. How can I validate that the cache is actually working and not just issuing a new token?
  2. How can I allow LiteLLM to only do one STS call at once? There's no point in getting multiple tokens in parallel (it will be the same speed), it's just wasteful on resources and adds noise.

@Manouchehri Manouchehri marked this pull request as draft June 1, 2024 02:07
@Manouchehri Manouchehri force-pushed the oidc-bedrock-httpx-caching-part-1 branch from 212c953 to 3410367 Compare June 1, 2024 15:25
@Manouchehri Manouchehri marked this pull request as ready for review June 1, 2024 15:38
@Manouchehri
Copy link
Collaborator Author

This shaves a lot of time off the current setup, between like 0.25 to 1 second per Bedrock call. :)

The STS fix is also needed.

Ready for merging!

@Manouchehri
Copy link
Collaborator Author

Bump on this? It's becoming a bit of a burden to maintain out-of-tree patches for weeks.

@krrishdholakia
Copy link
Contributor

hey @Manouchehri missed this - sorry for the delay on my end.

Saw you made a couple changes - could you walk me through it?

@krrishdholakia
Copy link
Contributor

krrishdholakia commented Jun 11, 2024

i'm also happy to hop on a call and help get all the oidc pr's done

https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat

@Manouchehri
Copy link
Collaborator Author

i'm also happy to hop on a call and help get all the oidc pr's done

Gimme a moment to finish testing this and #3861. =)

@Manouchehri
Copy link
Collaborator Author

You free now?


iam_cache = DualCache()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switch to in memory cache

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InMemoryCache -

class InMemoryCache(BaseCache):

aws_session_token=sts_response["Credentials"]["SessionToken"],
region_name=aws_region_name,
)
iam_cache.set_cache(key=iam_creds_cache_key, value=json.dumps(iam_creds_dict), ttl=3600 - 60)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have ttl be an enum at the top of the file, so it's easier to know

and aws_session_name is not None
):
oidc_token = get_secret(aws_web_identity_token)
if aws_web_identity_token is not None and aws_role_name is not None and aws_session_name is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment explaining when you'd enter this flow

@krrishdholakia krrishdholakia merged commit 422db5c into BerriAI:main Jun 12, 2024
3 checks passed
@Manouchehri
Copy link
Collaborator Author

Self note: I can confirm that this is indeed working, I now only see one AssumeRoleWithWebIdentity event before multiple Bedrock calls. =)

image

Previously (before this PR was merged), I had multiple AssumeRoleWithWebIdentity calls (one for each Bedrock invocation, which is waaaay too high of an overhead).

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants