New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
queued up PIPEWAIT transfers do not timeout correctly #13227
Comments
can you provide a complete stand-alone example that reproduces the problem? |
Not sure if the same issue as the original poster, but let's try this with the command line tool: However, run this: Interestingly, curl behaves as expected (timed out after 6 seconds) if any of the following is met:
|
Hi @bagder, I didn't try for a fully stand-alone program since I would have to replace the eventloop part with another publicly available one and integrate with the curl logic, we are using our own eventloop mechanism which is already in place. But the curl piece of code is exactly the same what I have captured. I was just concerned if there is anything wrong in my expectations or any aspect in the curl configurations I am setting could add to the delay we are seeing in my experiments. |
Hi @Smackd0wn, I will try out option 1 and 2 above which you suggested just to be sure if it's the same thing happening for me as well. However, regarding point 3, are you saying the delay shouldn't be seen for curl versions below 8.1.0, because the curl version I am using is 7.84.0, where I am seeing this issue. |
Hi @Smackd0wn, seems like you were spot on! I tried out a few more combinations and was able to see that if i DISABLE With
However, once i ENABLE Is this a known issue ? Any latest release has a fix delivered for this ? How do I use this option and still make the timeout happen in the configured time. |
Probably not a known issue. I think using the PIPEWAIT option should not interfere with timeout, so it's actually a bug of (lib)curl, not your usage of the library. I don't think the latest release (8.7.1) fixed this issue. Unfortunately I have not yet found the root cause of the issue. But my test commands above using the |
After running git bisect, the issue is likely caused by commit 843b3ba by @bagder Here is my understanding of the problem in the
Prior to that commit, pending handles were not removed from the main list. If I comment out the following line, the bug goes away in my case. Lines 1939 to 1940 in df3d5ef
This change is between 8.0.1 and 8.1.0. So I am less certain it is the same issue encountered by @rahulKrishnaM who uses 7.84.0. |
Hi @Smackd0wn, I tried moving to curl version 8.1.2 and applying the patch you mentioned above. I still see the same observation as in the ticket description. Also, reattempted the same disabling |
When there is a "change" in a multi handle and pending handles are moved back to the main list to be retested if they can proceed further (for example a previous transfer completed or a connection has a confirmed multiplexed state), the timeout check in multi_runsingle() would not trigger because it required an established connection. This could make a pending tranfer go back to pending state even though it had been "in progress" for a longer time than permitted. By removing the requirement for an associated connection, the timeout check will be done proper even for transfers that has not yet been assigned one. Fixes #13227 Reported-by: Rahul Krishna M
Can you check if #13276 makes anything better? |
It caused segmentation fault in |
Hi @bagder, I tried out the patch, but I think I still see the drift. And, after some time there was a crash, with bt as below (as @Smackd0wn has reported).
But, there was evident drift in timeout before this crash happened. |
I believe I've fixed that in an update
@Smackd0wn's curl -Z cmdline test went from 24 seconds to 6 seconds for me, so it does have some impact in my tests. As we don't have a recipe for your case @rahulKrishnaM, it is harder for me to debug that. |
Thank you very much for the fix. Indeed it fixed the segfault and improved the timeout behaviour. Now timeout without any connections is mostly correct. However, when I increased the numbers and ran |
I believe the reason for this is related to the optimization I did to fix #2369, which is not really optimal when where are many transfers pending due to PIPEWAIT... I have few ideas to work on. |
I'm yet to try out with the latest changes @bagder. However, when I said the drift was still there, I am trying out with a higher timeout so that the variation is evident. I tried out with 25 seconds as connection_timeout, where I could see the variations as: This is the pattern of logs when timeouts happen.
|
When there is a "change" in a multi handle and pending handles are moved back to the main list to be retested if they can proceed further (for example a previous transfer completed or a connection has a confirmed multiplexed state), the timeout check in multi_runsingle() would not trigger because it required an established connection. This could make a pending tranfer go back to pending state even though it had been "in progress" for a longer time than permitted. By removing the requirement for an associated connection, the timeout check will be done proper even for transfers that has not yet been assigned one. Ref #13227 Reported-by: Rahul Krishna M Closes #13276
When there is a "change" in a multi handle and pending handles are moved back to the main list to be retested if they can proceed further (for example a previous transfer completed or a connection has a confirmed multiplexed state), the timeout check in multi_runsingle() would not trigger because it required an established connection. This could make a pending tranfer go back to pending state even though it had been "in progress" for a longer time than permitted. By removing the requirement for an associated connection, the timeout check will be done proper even for transfers that has not yet been assigned one. Ref #13227 Reported-by: Rahul Krishna M Closes #13276
Since we can go to the CONNECT state from PENDING, potentially multiple times for a single transfer, this change introdues a SETUP state that happens before CONNECT when doing a new transfer. Now, doing a redirect on a handle goes back to SETUP (not CONNECT like before) and we initilize the connect timeout etc in SETUP. Previously, we would do it in CONNECT but that would make it unreliable in cases where a transfer goes in and out between CONNECT and PENDING multiple times. SETUP is transient, so the handle never actually stays in that state. Additionally: take care of timeouts of PENDING transfers in curl_multi_perform() Ref: #13227
Since we can go to the CONNECT state from PENDING, potentially multiple times for a single transfer, this change introdues a SETUP state that happens before CONNECT when doing a new transfer. Now, doing a redirect on a handle goes back to SETUP (not CONNECT like before) and we initilize the connect timeout etc in SETUP. Previously, we would do it in CONNECT but that would make it unreliable in cases where a transfer goes in and out between CONNECT and PENDING multiple times. SETUP is transient, so the handle never actually stays in that state. Additionally: take care of timeouts of PENDING transfers in curl_multi_perform() Ref: #13227 Closes #13371
@rahulKrishnaM please try again with current master and let us know if there is any difference |
Hi @bagder I need to check this. Will get back with the latest commit changes how my testing goes. |
Presumed to be fixed. |
Hi, below program is a pseudocode (only relevant pieces) I came up with just to demonstrate the issue I am facing. Basically, we are trying to send multiple requests to a non-existent IP (
192.168.1.8
), to ensure that we get a connection_timeout callback in the configured time (2500 millisec
below).Log prints from above code, where new requests are attempted to be sent out.
Log prints from above code, where the responses/timeout as informed by curl in CheckFinishedConnections() are traced. Also captured inline the actual time it took from the time the request was sent to curl library (which is diff between above print and below print).
We are relying on curl's connection_timeout parameter in order to take some action on application side. But here, as you see having set a 2.5sec connection timeout, the expiry is happening at different range of times (2.5sec to 7sec). This is a small example where 1 request was attempted per second, and is continued for 10 seconds (so ten requests in total).
I have also seen cases where it goes till 15seconds. I also tried increasing connection_timeout to a higher value (15sec), but then the timeouts were in the range of 15 to 120seconds. Is there some explanation to this, i want to know if the way are using the CURLMOPT_TIMERFUNCTION is proper or not.
Also, as an fyi, we are running an eventloop which has a granurality of 50milliseconds, so all the application timer events(timer_m above) gets polled at this frequency, and I have tried tracing the logs to see that the timer polls are happening properly from application side.
I expected the following
connect timeout to a non-existent ip should return in the configured timeout, or atleast in the near vicinity(+1s).
curl/libcurl version
curl 7.84.0
operating system
Fedora release 37
The text was updated successfully, but these errors were encountered: