-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Azure CLI task fails with AADSTS700024
after 60 minutes
#28708
Comments
refresh OIDC token is a feature |
Callback interface proposalsDifferent external identity providers (IdP) have different ways of retrieving the ID token:
I had a discussion with MSAL team today and proposed 2 possible callback interfaces:
|
Mitigation: Extend task duration to 60 minutesWarning This mitigation doesn't work with Azure CLI 2.59.0. See #28708 (comment).
An ID token lasts for 5 minutes on GitHub Actions and 10 minutes on Azure DevOps, but an access token lasts for 60 minutes. When you run After the ID token expires, if acquiring an access token for other scopes, such as
as currently there is no access token for that scope in the token cache, Azure CLI/MSAL will try to get an access token with the ID token. However, as the ID token has expired, the command fails with So, the mitigation is pretty straightforward: Acquire all access tokens before the ID token expires. You have to know which scopes are used in your pipeline task and call For example:
Warning Even though GitHub Actions can mask the access token as
You MUST specify Then subsequence commands using these scopes will use the access tokens saved in the token cache, so that they won't fail after the ID token expires, but they will still fail after the access token expires (60 minutes). |
I tried fixing the issue with provided mitigation but it is still persistent, maybe I'm doing something wrong?
After that I added step to mitigate the issue:
But after ~10 minutes Im still getting:
Did I miss something? I use https://www.npmjs.com/package/@azure/service-bus |
Thanks for the mitigation @jiasli. However, I don't think I'm hitting the issue where the Azure CLI tries to acquire an access token for a difference audience after the ID token has expired. I'm fairly confident that the
The general flow is:
The time it takes to swap slots varies greatly, however more than 5 minutes have always elapsed by the time it's done. Now, what is strange is that stopping the slot sometimes work, and sometimes doesn't, dependending on how much time has passed since we ran To me, it sounds like the access token expires "quicker" than before. Edit: I checked across many workflow runs, and to me it looks like the access token expires after 10 minutes. |
@Kapsztajn, I can successfully get an access token for
Decoded claims:
I am not entirely sure why this line is printed:
The Azure Service Bus client library for JavaScript SDK also didn't fail with |
@mderriey, this seems odd as all these operations are indeed ARM operations. Could you check the actual expiration time of the access token issued for ARM?
|
Hi @Kapsztajn, the suggested mitigation did not work for me as well. It was able to fetch the token with an expiry that was reasonable, but I was able to see the same error once the OID token expired after 5 mins. I propose a workaround by fetching the OID token every 4 mins to avoid the expiry. I was able to get this working and here is what I did: I inserted the following step in my workflow just before the step where this token expiry issue was popping:
Could you try this out and see if this works for you as well? |
Good suggestion @jiasli , thanks. Here's what I ran: steps:
- name: Login to Azure
uses: azure/login@v2
with:
client-id: ${{ env.oidcAppRegistrationClientId }}
tenant-id: ${{ env.azureTenantId }}
allow-no-subscriptions: true
enable-AzPSSession: true
- name: Check token expiry
shell: bash
run: |
echo "Current date: $(date '+%Y-%m-%dT%H:%M:%S')"
echo "Token expiration: $(az account get-access-token --resource-type arm --query expiresOn --output tsv --debug)"
echo "Token AzureAD/microsoft-authentication-library-for-python#2 expiration: $(az account get-access-token --resource-type arm --query expiresOn --output tsv --debug)" And the output (debug output omitted): Current date: 2024-04-11T06:57:14
Token expiration: 2024-04-11 07:57:14.000000
Token AzureAD/microsoft-authentication-library-for-python#2 expiration: 2024-04-11 07:57:14.000000 So the token is valid for 1 hour. And both calls to
I'm not sure what happens, then... Thanks again, let me know if I can perform some more testing if anything comes to mind. |
Apologize for the confusion caused. As I tested today, the mitigation I provided in #28708 (comment) stopped working for Azure CLI 2.59.0, because of an MSAL regression introduced in 1.27.0 (AzureAD/microsoft-authentication-extensions-for-python#127, AzureAD/microsoft-authentication-library-for-python#644) which is adopted by Azure CLI 2.59.0 (#28556). This regression makes MSAL's I will work with MSAL on this issue with high priority. WorkaroundFor now, please keep using service principal secret for authentication to get unblocked: https://github.com/marketplace/actions/azure-login#login-with-a-service-principal-secret |
My question is why this has popped up as an issue recently. We've had pipelines run for well over 20 minutes before and never seen this. But within the last week, it seems any workflow using Azure CLI with OIDC federated auth is experiencing this issue. |
@iamrk04
I had to add |
@smokedlinq, In my case, it's due to a new version of the GitHub hosted runner image for The image went from You can see which image your run uses in the "Set up job" step at the very top. |
@mderriey I assumed something like that, I was more referring to how that broke inside of |
We started having problems with the v2.59.0 az cli and rolled back as a workaround. I'm not sure what about the cli release makes this more/less likely to hit this. |
@smokedlinq, please refer to my comment #28708 (comment). |
This workaround #28708 (comment) proposed by @iamrk04 of periodically calling
This workaround #28708 (comment) proposed by @dghubble of using an old version is a correct one. As I suggested in #28708 (comment), using service principal secret for authentication is also another acceptable workaround. |
@jiasli Service principals are unacceptable for some of us as our security certification would require we rotate them on a regular basis. OIDC does not add that additional burden given that they are clearly short lived. |
AADSTS700024
AADSTS700024
after 60 minutes
@andre-qumulo, we plan to fix the 5-minute expiration issue in the next version of Azure CLI which will be 2.60.0 and released on 2024-04-30. Using a service principal is only a temporary workaround. Secret rotation usually happens on a monthly basis which is far beyond the time we need to fix it. I have created a separate issue to track it: |
Thanks @jiasli! The mitigation steps for Azure DevOps provided here of using a service principal secret were effective. (I ran into some trouble finding the organization id while following the instructions but was able to find the organization id with these steps: https://medium.com/@shivapatel1102001/get-list-of-organization-from-azure-devops-microsoft-account-861ea29dae93) |
@TomWildenhain, based on my understanding, the steps provided by https://learn.microsoft.com/en-us/azure/devops/pipelines/library/connect-to-azure?view=azure-devops don't require organization ID when creating a service connection using service principal secret. Could you let me know which article you are following? |
@jiasli Org id is a 1P policy. |
# Description Long-running tests have started to fail due to the az login authentication expiring. Based on the error message (see linked issue), the authentication is only valid for 5 minutes. This seems to be a known issue [based on discussion here](Azure/azure-cli#28708 (comment)) but there's no fix so adding this temporary workaround until there's a way to extend the time the auth is valid. ## Type of change <!-- Please select **one** of the following options that describes your change and delete the others. Clearly identifying the type of change you are making will help us review your PR faster, and is used in authoring release notes. If you are making a bug fix or functionality change to Radius and do not have an associated issue link please create one now. --> - This pull request fixes a bug in Radius and has an approved issue (issue link required). - This pull request adds or changes features of Radius and has an approved issue (issue link required). - This pull request is a minor refactor, code cleanup, test improvement, or other maintenance task and doesn't change the functionality of Radius #7490 <!-- Please update the following to link the associated issue. This is required for some kinds of changes (see above). --> Fixes: #issue_number Signed-off-by: sk593 <[email protected]>
@jiasli Thanks for your help. I was following the instructions in a banner at the top of ADO after creating the manual service connection. The banner states:
With a link to: https://learn.microsoft.com/en-us/azure/devops/pipelines/release/configure-workload-identity?view=azure-devops I used the instructions to call the API here to get the org id: https://medium.com/@shivapatel1102001/get-list-of-organization-from-azure-devops-microsoft-account-861ea29dae93 |
@TomWildenhain, thanks for the information. If you used service principal secret to create the service connection, I don't think the federated identity credential added to the app is actually used. |
@jiasli Is it possible to give any realistic timeline for a fix? I am wondering if it makes sense to ask for a rollback of the cli version contained in actions/runner-images that is used by both Github Actions and Azure DevOps. |
We are seeing the same issue related to moving away from service principal secrets. We are looking into adding logic for all Az CLI calls using the ARM token to ensure it gets refreshed (but not as a background process) to get the OIDC token from |
If you can help to resolve that will be appreciated |
I have exactly the same use case as @TomWildenhain. Is there a way to make the token valid period customable? We can't use Service principal as that's discouraged by the cred free best practices. Even a workaround would be much appreciated. |
Have the same issue for our long-running tasks:
|
@panpanwa we are not using github actions. We're using AzureDevOps in yml, e.g. - task: AzureCLI@2
displayName: Run load profile
inputs:
azureSubscription: $(federatedCredConnection)
scriptType: ps
scriptLocation: scriptPath
scriptPath: $(Pipeline.Workspace)/test.ps1 |
Acquiring access token with expired OIDC token fails with:
As the error indicates, the OIDC token is only valid for 10 minutes. After it is passed to
az login
via--federated-token
, Azure CLI cannot get a new OIDC token after the OIDC token expires.This is the designed v1 behavior of OIDC token support (#19853).
However, as Azure DevOps task AzureCLI@2 (microsoft/azure-pipelines-tasks#17633) and GitHub Action azure/login@v2 (Azure/login#147) have supported OIDC token authentication, and it is recommended to use workload identity federation, this limitation is becoming more prevailing.
Possible solutions
References
The text was updated successfully, but these errors were encountered: