Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache object stores per scheme + bucket #99

Open
wants to merge 2 commits into
base: aykut/numeric-with-larger-scale
Choose a base branch
from

Conversation

aykut-bozkurt
Copy link
Collaborator

@aykut-bozkurt aykut-bozkurt commented Jan 16, 2025

In this PR, we cache object stores per scheme (e.g. s3) + bucket (or container) combination in each Postgres session. This will reduce authentication costs by only doing it at the first time.

object_store does not perform sts assume_role to get temp token, so pg_parquet make use of aws sdk to perform it. And then configure object_store with the temp token that it fetched. This is why, pg_parquet checks expiration of the tokens for s3 store. It will fetch the temp token if it expires. (obviously if you configured temp token auth via config)

For azure blob store, we do not need to recreate tokens because object_store handles it.

Closes #93

@@ -4,5 +4,6 @@ trap "echo 'Caught termination signal. Exiting...'; exit 0" SIGINT SIGTERM

# create azurite container
az storage container create -n $AZURE_TEST_CONTAINER_NAME --connection-string $AZURE_STORAGE_CONNECTION_STRING
az storage container create -n ${AZURE_TEST_CONTAINER_NAME}2 --connection-string $AZURE_STORAGE_CONNECTION_STRING
Copy link
Collaborator Author

@aykut-bozkurt aykut-bozkurt Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created one more container to test object store cache per (schema, container) pair.

@aykut-bozkurt aykut-bozkurt force-pushed the aykut/cache-object-stores branch from 39bbefe to 17d4aa8 Compare January 16, 2025 20:19
let expiration_warn_msg =
format!("credentials for {bucket} expired at {expire_at:?}");

let expiration_hint = "New credentials will be automatically retrieved if configured.\n\
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if new credentials will be automatically retrieved, do we need to throw a warning?

Copy link
Collaborator Author

@aykut-bozkurt aykut-bozkurt Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wanted to give hint to refresh the session but seems like a bad spot. (retry should not warn if it succeeds)


if let Some(item) = self.cache.get(&key) {
if item.expired(&key.bucket) {
self.cache.remove(&key);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this release all the memory associated with the object store?

Copy link
Collaborator Author

@aykut-bozkurt aykut-bozkurt Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, impl Drop is called for each field in the structs.

@aykut-bozkurt aykut-bozkurt force-pushed the aykut/numeric-with-larger-scale branch from b8f9061 to 80f9a11 Compare January 22, 2025 14:00
In this PR, we cache object stores per scheme (e.g. s3) + bucket (container) combination
in each Postgres session. This will reduce authentication costs by only doing it at the first
time.

object_store does not perform sts assume_role to get temp token, so pg_parquet
make use of aws sdk to perform it. And then configure object_store with the temp
token that it fetched. This is why, pg_parquet checks expiration of the tokens for s3 store.
It will fetch the temp token if it expires. (obviously if you configured temp token auth via config)

For azure blob store, we do not need to recreate tokens because object_store handles it.
@aykut-bozkurt aykut-bozkurt force-pushed the aykut/cache-object-stores branch from 17d4aa8 to 9d90243 Compare January 22, 2025 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants