Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to retry vault token retrieval #442

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

wagnerm
Copy link
Contributor

@wagnerm wagnerm commented Apr 3, 2023

This change adds the retryVaultTokenRetrieval option to the Action that will enable retries for retrieving the Vault token, via the HTTP error retries implemented by the got package.

The problem I'm encountering is that sometimes the vault token retrieval, via the JWT method, results in a HTTP error, mostly 500s, and these kind of errors are not retried in the current version of vault-action. vault-action simply fails and there's no mitigation. This appears to be because the got package, by default, does not retry POST request types.

So the new retryVaultTokenRetrieval option adds the POST request to the set of request types the got package will retry when the option is specified.

Looking through the code, the vault token retrieval in retrieveToken is the only place where a POST request is performed, so the enablement of the option should only affect that function.

Other options for implementation:

I'm not the biggest fan of the addition of this option and would rather the vault-action always retry failed vault token retrievals, but I'm unsure if the community would appreciate that change in strategy. Alternatively these are the options I've thought of that may or may not be more acceptable than this change. I'm open to opinions.

  • Use the change proposed in this PR.
  • Always retry all POST requests performed within vault-action and remove the option retryVaultTokenRetrieval. This is a change in behavior that might result in addition Vault server load if retries are added by default.
    Basically the POST request type would be added to the default got client options.
  • If we are worried about addition POST requests happening in this Action, we could refactor the way the client is initiated and passed to retrieveToken so that only the POST requests tried within getClientToken are retried.

Sometimes we might encounter errors when retrieving the Vault token
using a method like JWT. In those cases, the action does not retry the
request today because the got package does not try POST requests by default.

This change adds an option called retryVaultTokenRetrieval that will
add the POST method to the retriable methods got uses. The post method
is not used in any other place in this action, so having the POST method
added to the defaultOptions seems okay for now.
When the retryVaultTokenRetrieval option is set in the action
we will now see HTTP errors when retrieving the Vault token retried.

This adds a test block to test the client.post that is performed
during the token retrieval is retried on an HTTP error, like a 500.
@hashicorp-cla
Copy link

hashicorp-cla commented Apr 3, 2023

CLA assistant check
All committers have signed the CLA.

@wagnerm wagnerm marked this pull request as ready for review April 3, 2023 21:23
@maxcoulombe
Copy link
Contributor

maxcoulombe commented May 19, 2023

Hey @wagnerm thanks for taking a jab at this issue!

would rather the vault-action always retry failed vault token retrievals

I agree, I feel like the action should be resilient by itself and this is not something users should have to think about and opt into. I'd be keen to find a universal solution and drop the additional retryVaultTokenRetrieval input.

Always retry all POST requests performed within vault-action

As you already pointed out there are drawbacks to this. I'm worried this would have undesirable side-effects. POSTs should only be retried for specific situations, like here when trying to authenticate, but a blanket retry on all POST requests will surely bite us in the future.

only the POST requests tried within getClientToken are retried

This seems like the best solution to me. Retry the v1/auth/${path}/login request or getClientToken method systematically with sensible default values (say, up to 5 retries spaced by 1 seconds?).

I don't think got's client options allow you to specify a retry strategy for specific paths so the most pragmatic solution is likely to wrap the token creation request in a small promise retry pattern.

Hope it all make sense!

@wagnerm
Copy link
Contributor Author

wagnerm commented May 19, 2023

I don't think got's client options allow you to specify a retry strategy for specific paths so the most pragmatic solution is likely to wrap the token creation request in a small promise retry pattern.

@maxcoulombe My only reservation with this approach is that it seems like it would be reimplementation of some of the got client's already good functionality and the new code we add would also have to be tested for thoroughly in this codebase as well.

Alternatively we could simply alter the options we pass to the got client in the retrieveToken function invocation. I'm referring to this line. This would at least only mutate the requests made within that function.

Basically I'm thinking about doing something this (please correct me or provide a suggestion because I'm not that well versed in js 😄):

+    // copy the defaultOptions so we don't mutate in case it is used later on the in Action.
+    let retrieveTokenClientOptions = JSON.parse(JSON.stringify(defaultOptions));
+    retrieveTokenClientOptions.retry.methods = [...got.defaults.options.retry.methods, 'POST'];

+    const vaultToken = await retrieveToken(vaultMethod, got.extend(retrieveTokenClientOptions));
-    const vaultToken = await retrieveToken(vaultMethod, got.extend(defaultOptions));

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants