-
Notifications
You must be signed in to change notification settings - Fork 474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.Net7 Minimal API running under provider.al2 runtime timeout on quick fire successive requests #1543
Comments
@ashishdhingra & @normj - Thanks - to confirm you were able to deploy and reproduce? Not sure this is related to annotations per label? |
@Simonl9l I cloned your repo and although I didn't run it through the GitHub actions I ran same Is there anything in the CloudWatch logs when you get the timeout? |
@normj thanks of taking a look! You'll note that the actual Minimal API response succeeded per the top line of the log; but you will see the timeout as highlighted. Our assumption is that the runtime is pulling the rug out for some reason, on the basis of a timeout, but this is somehow erroneous, as the full elapsed time from the first call is only ~5 seconds. logging shows: logging shows a 30 sec gap here...but there is no reason for the code to stall, as it's exactly the same that ran successfully the two prior iterations. Here is the full CloudWatch log of the three successive executions to where the third fails. Also to confirm that you are using the |
@normj & @ashishdhingra - sorry to bug, is there anything we can do to assist the reproduction effort? I believe I sent Norm a direct email offline if we want to connect/screen share etc. |
It's unfortunate that you and team have not be able to recreate this issue, that we are aware. We do though have more information. Our hypothesis is that there is something wrong with the raw AL2 image, that only manifest when using the underpinning of the We get to this because we can build a self contained executable in both .Net 7 (and .Net 8 preview 6 - just to see) and deploy them to a .Net 6 runtime lambda, and they both work as expected, with no timeouts - on basis that the executable is called This means we have a way forwards, and can avoid the AL2 runtime entirely. It had not occurred to us that one could do this before. However if we switch the Lambda runtime to the AL2 runtime, using the same packages in turn it runs as expected but we get exactly the same failure observations as we did before. Of note, if we build a self-contained .Net 6 package as above and deploy that to AL2 runtime, it also fails as observed. Our assumptions is that when .Net 8 goes LTS, the existing .Net lambda runtime will be updated to support it, and what we have identified will likely impact few, if anyone. However as MSFT support AOT builds of the .Net runtime, that say underpins Minimal API, this may surface for those trying to run that in the current AL2 image. Perhaps well event see Amazon Linux 2023 but then - right? Well also probably be fine in the future with running Minimal API AOT images (when supported) in the current dotnet runtime too. Is there anything y'all know about that we may be missing in running any self-contained or AOT'd packages in the dotnet runtime that might present a future problem here? |
I appear to be having this issue as well. I have been spinning why wheels for 3 weeks on multiple .NET projects, no matter what I do my .NET 8 lambda functions timeout (30 seconds at API gateway, 60 seconds at the end of "execution") for the most simplest of things. I am not using AoT. This is making Lambda a completely unusable platform for us. I had mentioned it in the Tracking .NET 8 support issue nearly a month ago. |
@dguisinger Is there a sample project you can share that demonstrates this behavior? |
I'll throw together a sample project tonight |
@normj I am at a complete loss. I created a new project, has identical Program.cs code, uses MediatR for CQRS and contains DynamoDB transaction logic in a separate assembly.... and it works just as it would be expected to. I don't understand any differences between this project, and the two others I have which I have spent hundreds of hours trying to figure out why every single call timeouts when ran in Lambda (but work fine on localhost) The biggest differences are in CICD pipeline...my two other projects use CodeBuild/CodePipeline in a separate DevOps account to build and deploy the code, but uses identical CDK code for building the lambda image and deploying it. The only difference (which should not make a difference, in theory) is this project I'm running the CDK deploy and docker build on a Windows machine, whereas with CodeBuild it would be on linux. As mentioned in my previous posts, the more logging I added, the more random the position where updates in the log stopped.... it even improved greatly when I removed all logging from the app, but didn't solve the problem. I have absolutely no idea how to even tackle this at this point, I can't keep throwing hundreds of hours into code that performs perfectly under Kestrel and then doesn't execute in Lambda... and there appear to be no ways to get the Lambda runtime to give useful debugging information. |
@dguisinger we found that building a single executable (either AoT or bundled runtime) seemed to solve what ever we were doing to cause the timeout symptoms. It's unclear if your symptoms are the same as what you were seeing. Given our example, the AWS team were unable to recreate it, even though with our sample we could recreate it readily, that was also a default template lambda not using other assemblies as you indicate. We'd surmised that something was not quite 100% good in relation to the lambda runtime and dotnet runtime - but we can't evidence it. We've not had the wherewithal to experiment further, or really get into any preview net 8.0 or AL2023 support testing. Our hope was that this would be solved with the AL2032 runtime, but fear given your experiences this has somehow carried over. It's also interesting that you are using CDK. But it's unclear how that might be a factor. |
@Simonl9l i had started down that path of the AoT/bundled runtime and never got it working, but I didn't spend too much time on it (its so painful to tweak one line at a time and push it back into the CICD pipeline and wait for it to do its thing) At the moment, Swagger pages respond 100% of the time..... and occasionally I can get the WebAPI to respond (instantly), and when it does I can pound the API relentlessly and it stays working for hundreds of calls... until it idles for a few minutes, and then its back to timeouts. When I check the CloudWatch logs, its the exact same stream for the instance that was timing out initially that suddenly starts responding, so its not spinning up a new instance to handle the request. This leads me to guess its a problem in the AspNet hosting layer which last time I checked is still targeting 6.0. As I had mentioned in the .NET 8 tracking issue, when I had changed the logging level on .NET to a higher level, even Swagger stopped responding. And I had also found that adding more inline logging to find how far it was making it into my code would actually sometimes change the point at where my code appeared to stop running. |
@Simonl9l @normj Interestingly, I tried publish ... --self-contained today on my company's APIs that are having this issue, it changed the behavior to make it work more often, but it still has the timeouts. However it allowed me to explore the pattern in the logs. This would seem to indicate the container keeps restarting the code on every request if its been timing out. Could the runtime be terminating on a failure, causing execution to stop, with no response back to Lambda itself telling it execution failed - resulting in it believing it timed out? |
@dguisinger I'm assuming this is an ASP .NET Core based Lambda function. Would be possible to share, at least a redacted version, of the startup code for the function? Like the |
@normj sure, here you go I did try putting my sample project into my CICD pipeline that the other projects go through just in case building with CodeBuild was causing the issue, but the sample still worked. I also commented out all of the controllers/actions in one of my broken projects to get down to just 1 controller/action and it was still broken, so the sample being simplified in that regard wasn't an issue. The only difference I can think of at this point is the broken projects have several more assemblies (for the DTOs, database repository/entities, etc) and in-house nuget packages for cross-API shared code that the sample project didn't.
|
Describe the bug
Related to discussion #1424, but figure an Issue might garner more attention, since prior response there were stale and now have a reproducible sample.
In summary we have a
.Net 7
Minimal API setup (serverless.AspNetCoreMinimalAPI
) switched to .Net7 and using theamazon.lambda.tools
to build the package using the docker image process via the-ucfb True
option.Our sense is that in
.Net7
the Host/Middleware, or something related to its interaction with theprovider.al2
runtime that may have changed with the newer .Net version is now possibly incompatible the.AddAWSLambdaHosting
middleware.Expected Behavior
Successive and continuous request should complete successfully
Current Behavior
Once deployed we find:
/
can be continuously quick-fired without error./calculator/add/1/2
initially succeeds but then fail after ~3-5 rapid requests with a timeout, as set by the lambda deployment (30 secs in our case).Reproduction Steps
We have a GitHub workflow enabled clone-able/deployable repo here with instructions here
We have used the default template samples so that we can eliminate if any of our code is introducing a side effect, and whist we had added some of that code it's commented out.
Possible Solution
We need to migrate to .Net7 such that we can use latest C# features.
Additional Information/Context
We note comments in the
RuntimeSupport.Client.InvocationResponse
there is special handling of theOutputStream
for successive invocations, but are unsure of the implication in this scenario, and assume this in an optimization.Of note:
/
(default) route request in the Map does not leverage any default middleware where as the other routes to map the request on the controller do.Net7.0
is not explicitly supported for minimal API, given thelambda.NativeAOT
template, we assume basic compatibility with theprovider.al2
custom runtime<PublishAot>false</PublishAot>
in thecsproj
, to eliminate any concern that this my be an issue.AWS .NET SDK and/or Package version used
Amazon.Lambda.AspNetCoreServer.Hosting - 1.6.0
AWSSDK.Extensions.NETCore.Setup - 3.7.7
Amazon.Lambda.RuntimeSupport - 1.8.8
Amazon.Lambda.Core" Version - 2.1.0
Amazon.Lambda.Serialization.SystemTextJson - 2.3.1
Targeted .NET Platform
.Net 7
Operating System and version
AmazonLinux
The text was updated successfully, but these errors were encountered: