Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

queued up PIPEWAIT transfers do not timeout correctly #13227

Closed
rahulKrishnaM opened this issue Mar 29, 2024 · 19 comments
Closed

queued up PIPEWAIT transfers do not timeout correctly #13227

rahulKrishnaM opened this issue Mar 29, 2024 · 19 comments
Assignees

Comments

@rahulKrishnaM
Copy link

rahulKrishnaM commented Mar 29, 2024

Hi, below program is a pseudocode (only relevant pieces) I came up with just to demonstrate the issue I am facing. Basically, we are trying to send multiple requests to a non-existent IP (192.168.1.8), to ensure that we get a connection_timeout callback in the configured time (2500 millisec below).

void Init() {

    curl_global_init(CURL_GLOBAL_ALL);
    multi_handle_m = curl_multi_init();

    curl_multi_setopt(multi_handle_m, CURLMOPT_TIMERFUNCTION, CurlTimerCallback);
    curl_multi_setopt(multi_handle_m, CURLMOPT_TIMERDATA, this);

    curl_multi_setopt(multi_handle_m, CURLMOPT_SOCKETFUNCTION, CurlSocketCallback);
    curl_multi_setopt(multi_handle_m, CURLMOPT_SOCKETDATA, this);

    curl_multi_setopt(multi_handle_m, CURLMOPT_MAXCONNECTS, 1536);
    curl_multi_setopt(multi_handle_m, CURLMOPT_MAX_HOST_CONNECTIONS, 512);
    curl_multi_setopt(multi_handle_m, CURLMOPT_MAX_TOTAL_CONNECTIONS, 2048);
    curl_multi_setopt(multi_handle_m, CURLMOPT_PIPELINING, CURLPIPE_MULTIPLEX);
    curl_multi_setopt(multi_handle_m, CURLMOPT_MAX_CONCURRENT_STREAMS, 4096);
}

void StartConnection(const std::string& url)  {

    log("Starting a connection :%s", url);

    uint16_t time = 5;
    curl_easy_setopt(easy_handle, CURLOPT_TIMEOUT, time);
    curl_easy_setopt(easy_handle, CURLOPT_CONNECTTIMEOUT_MS, (time*1000)/2);
    curl_easy_setopt(easy_handle, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_2_PRIOR_KNOWLEDGE);
    curl_easy_setopt(easy_handle, CURLOPT_PIPEWAIT, 1L);

    curl_easy_setopt(easy_handle, CURLOPT_URL, url.c_str());
    curl_easy_setopt(easy_handle,  CURLOPT_CUSTOMREQUEST, "GET");
	
    curl_easy_setopt(easy_handle, CURLOPT_ERRORBUFFER, errorBuff_m);

    curl_easy_setopt(easy_handle, CURLOPT_DNS_CACHE_TIMEOUT, 60);
    curl_easy_setopt(easy_handle, CURLOPT_TCP_KEEPALIVE, true);
    curl_easy_setopt(easy_handle, CURLOPT_TCP_KEEPIDLE, 60);
    curl_easy_setopt(easy_handle, CURLOPT_TCP_KEEPINTVL, 60);

    auto result = curl_multi_add_handle(multi_handle_m, easy_handle);

    // store the easy_handle and url to a global map to retrieve later
}

void StartHttpTimer(unsigned long timeout_ms) {

    // arm a timer: timer_m to expire after timeout_ms as curl has indicated.
    timer_m.after(milliseconds(timeout_ms), this, TimerTriggered);
}

void TimerTriggered() {

    int running_count = 0;
    CURLMcode rc = curl_multi_socket_action(multi_handle_m, CURL_SOCKET_TIMEOUT, 0, &running_count);

    CheckFinishedConnections();
}

void CheckFinishedConnections() {
    
    while (true) {

        int msg_count = 0;
        CURLMsg* msg = curl_multi_info_read(multi_handle_m, &msg_count);

        if (msg == nullptr) {
            break;
        }

        if (msg->msg == CURLMSG_DONE) {
            // retrieve url from global map using easy_handle.
            log("Error :%s, connection: %s, result: %d", errorBuff_m, url, msg->data.result);

            curl_multi_remove_handle(multi_handle_m, msg->easy_handle);
        }
    }
}

int CurlTimerCallback(CURLM* multi_handle, long timeout_ms, void* user_pointer) {

    if (timeout_ms >= 0) {
        StartHttpTimer(timeout_ms);
    }
	else {
        StopHttpTimer();
    }
    return 0;
}
	
int main() {

    Init();

    // start 10 connections to a non reachable ip
    for (int i = 0; i<10; i++) {
        StartConnection("http://192.168.1.8:9001/code/" + std::to_string(i));
    }

    while (true)
    {
        // run eventloop logic for processing timers: i.e timer_m here
	// sleep for 50milliseconds
    };
}

Log prints from above code, where new requests are attempted to be sent out.

{"2024-03-28T15:29:25.325770","Starting a connection :"http://192.168.1.8:9001/code/1"}
{"2024-03-28T15:29:26.334195","Starting a connection :"http://192.168.1.8:9001/code/2"}
{"2024-03-28T15:29:27.302334","Starting a connection :"http://192.168.1.8:9001/code/3"}
{"2024-03-28T15:29:28.334490","Starting a connection :"http://192.168.1.8:9001/code/4"}
{"2024-03-28T15:29:29.290601","Starting a connection :"http://192.168.1.8:9001/code/5"}
{"2024-03-28T15:29:30.302908","Starting a connection :"http://192.168.1.8:9001/code/6"}
{"2024-03-28T15:29:31.290790","Starting a connection :"http://192.168.1.8:9001/code/7"}
{"2024-03-28T15:29:32.299960","Starting a connection :"http://192.168.1.8:9001/code/8"}
{"2024-03-28T15:29:33.310006","Starting a connection :"http://192.168.1.8:9001/code/9"}
{"2024-03-28T15:29:34.327041","Starting a connection :"http://192.168.1.8:9001/code/10"}

Log prints from above code, where the responses/timeout as informed by curl in CheckFinishedConnections() are traced. Also captured inline the actual time it took from the time the request was sent to curl library (which is diff between above print and below print).

{"2024-03-28T15:29:27.880135","Connection timeout after 2503 ms","connection":"http://192.168.1.8:9001/code/1","result":28} --> 2.55446s
{"2024-03-28T15:29:30.481484","Connection timeout after 2548 ms","connection":"http://192.168.1.8:9001/code/2","result":28} --> 4.14733s
{"2024-03-28T15:29:33.005194","Connection timed out after 2524 milliseconds","connection":"http://192.168.1.8:9001/code/3","result":28} --> 5.70291s
{"2024-03-28T15:29:35.534235","Connection timed out after 2528 milliseconds","connection":"http://192.168.1.8:9001/code/4","result":28} --> 7.1999s
{"2024-03-28T15:29:35.534352","Failed to resolve host '192.168.1.8' with timeout after 0 ms","connection":"http://192.168.1.8:9001/code/5","result":28} --> 6.24379s
{"2024-03-28T15:29:35.534406","Failed to resolve host '192.168.1.8' with timeout after 0 ms","connection":"http://192.168.1.8:9001/code/6","result":28} --> 5.23156s
{"2024-03-28T15:29:38.034624","Connection timed out after 2501 milliseconds","connection":"http://192.168.1.8:9001/code/8","result":28} --> 5.73471s
{"2024-03-28T15:29:38.086514","Failed to resolve host '192.168.1.8' with timeout after 0 ms","connection":"http://192.168.1.8:9001/code/7","result":28} --> 6.79586s
{"2024-03-28T15:29:40.587947","Connection timed out after 2501 milliseconds","connection":"http://192.168.1.8:9001/code/9","result":28} --> 7.27811s
{"2024-03-28T15:29:40.588108","Failed to resolve host '192.168.1.8' with timeout after 0 ms","connection":"http://192.168.1.8:9001/code/10","result":28} --> 6.26132s

We are relying on curl's connection_timeout parameter in order to take some action on application side. But here, as you see having set a 2.5sec connection timeout, the expiry is happening at different range of times (2.5sec to 7sec). This is a small example where 1 request was attempted per second, and is continued for 10 seconds (so ten requests in total).
I have also seen cases where it goes till 15seconds. I also tried increasing connection_timeout to a higher value (15sec), but then the timeouts were in the range of 15 to 120seconds. Is there some explanation to this, i want to know if the way are using the CURLMOPT_TIMERFUNCTION is proper or not.

Also, as an fyi, we are running an eventloop which has a granurality of 50milliseconds, so all the application timer events(timer_m above) gets polled at this frequency, and I have tried tracing the logs to see that the timer polls are happening properly from application side.

I expected the following

connect timeout to a non-existent ip should return in the configured timeout, or atleast in the near vicinity(+1s).

curl/libcurl version

curl 7.84.0

operating system

Fedora release 37

@bagder
Copy link
Member

bagder commented Mar 30, 2024

can you provide a complete stand-alone example that reproduces the problem?

@Smackd0wn
Copy link

Not sure if the same issue as the original poster, but let's try this with the command line tool:
time curl -vZm 3 'http://127.0.0.1:1234/[1-50]'
I have a non-responding TCP server running on port 1234 with socat tcp-l:1234,fork -. curl behaved as expected, i.e., opened 50 connections in parallel and timed out after 3 seconds.

However, run this:
time curl -vZm 3 --parallel-max 25 'http://127.0.0.1:1234/[1-50]'
and it took 24 seconds to finish. But it should be 6 seconds: the first batch of 25 connections should time out after 3 seconds, and then goes the second batch of 25 connections, which times out after another 3 seconds.

Interestingly, curl behaves as expected (timed out after 6 seconds) if any of the following is met:

  • curl_multi_setopt(multi, CURLMOPT_PIPELINING, 0)
  • --parallel-immediate which does curl_easy_setopt(curl, CURLOPT_PIPEWAIT, 0)
  • curl release <8.1.0 (e.g., 8.0.1), i.e., this behaviour is probably a regression

@rahulKrishnaM
Copy link
Author

can you provide a complete stand-alone example that reproduces the problem?

Hi @bagder, I didn't try for a fully stand-alone program since I would have to replace the eventloop part with another publicly available one and integrate with the curl logic, we are using our own eventloop mechanism which is already in place. But the curl piece of code is exactly the same what I have captured.

I was just concerned if there is anything wrong in my expectations or any aspect in the curl configurations I am setting could add to the delay we are seeing in my experiments.

@rahulKrishnaM
Copy link
Author

Interestingly, curl behaves as expected (timed out after 6 seconds) if any of the following is met:

  • curl_multi_setopt(multi, CURLMOPT_PIPELINING, 0)
  • --parallel-immediate which does curl_easy_setopt(curl, CURLOPT_PIPEWAIT, 0)
  • curl release <8.1.0 (e.g., 8.0.1), i.e., this behaviour is probably a regression

Hi @Smackd0wn, I will try out option 1 and 2 above which you suggested just to be sure if it's the same thing happening for me as well. However, regarding point 3, are you saying the delay shouldn't be seen for curl versions below 8.1.0, because the curl version I am using is 7.84.0, where I am seeing this issue.

@rahulKrishnaM
Copy link
Author

Hi @Smackd0wn, seems like you were spot on!

I tried out a few more combinations and was able to see that if i DISABLE CURLOPT_PIPEWAIT, things are working as expected, connection_timeout is happening at the exact configured time (2.5seconds as below).

With curl_easy_setopt(curl, CURLOPT_PIPEWAIT, 0):

{"2024-04-02T07:40:12.633122","Starting a connection :"http://192.168.1.8:9001/code/1"}
{"2024-04-02T07:40:13.632655","Starting a connection :"http://192.168.1.8:9001/code/2"}
{"2024-04-02T07:40:14.632545","Starting a connection :"http://192.168.1.8:9001/code/3"}
{"2024-04-02T07:40:15.585072","Starting a connection :"http://192.168.1.8:9001/code/4"}
{"2024-04-02T07:40:16.632501","Starting a connection :"http://192.168.1.8:9001/code/5"}
{"2024-04-02T07:40:17.632728","Starting a connection :"http://192.168.1.8:9001/code/6"}
{"2024-04-02T07:40:18.633872","Starting a connection :"http://192.168.1.8:9001/code/7"}
{"2024-04-02T07:40:19.633042","Starting a connection :"http://192.168.1.8:9001/code/8"}
{"2024-04-02T07:40:20.633137","Starting a connection :"http://192.168.1.8:9001/code/9"}
{"2024-04-02T07:40:21.634283","Starting a connection :"http://192.168.1.8:9001/code/10"}
{"2024-04-02T07:40:15.182251","Connection timeout after 2549 ms","connection":"http://192.168.1.8:9001/code/1","result":28} --> 2.54923s
{"2024-04-02T07:40:16.181521","Connection timeout after 2549 ms","connection":"http://192.168.1.8:9001/code/2","result":28} --> 2.54895s
{"2024-04-02T07:40:17.183281","Connection timeout after 2549 ms","connection":"http://192.168.1.8:9001/code/3","result":28} --> 2.5515s
{"2024-04-02T07:40:18.134850","Connection timeout after 2549 ms","connection":"http://192.168.1.8:9001/code/4","result":28} --> 2.54983s
{"2024-04-02T07:40:19.185437","Connection timeout after 2549 ms","connection":"http://192.168.1.8:9001/code/5","result":28} --> 2.55308s
{"2024-04-02T07:40:20.182822","Connection timeout after 2550 ms","connection":"http://192.168.1.8:9001/code/6","result":28} --> 2.55021s
{"2024-04-02T07:40:21.183944","Connection timeout after 2549 ms","connection":"http://192.168.1.8:9001/code/7","result":28} --> 2.55018s
{"2024-04-02T07:40:22.183992","Connection timeout after 2550 ms","connection":"http://192.168.1.8:9001/code/8","result":28} --> 2.55118s
{"2024-04-02T07:40:23.183388","Connection timeout after 2550 ms","connection":"http://192.168.1.8:9001/code/9","result":28} --> 2.5504s
{"2024-04-02T07:40:24.186057","Connection timeout after 2551 ms","connection":"http://192.168.1.8:9001/code/10","result":28} --> 2.5519s

However, once i ENABLE CURLOPT_PIPEWAIT, the connection_timeout starts drifting, as captured in the original ticket description.

Is this a known issue ? Any latest release has a fix delivered for this ? How do I use this option and still make the timeout happen in the configured time.

@Smackd0wn
Copy link

Is this a known issue ? Any latest release has a fix delivered for this ? How do I use this option and still make the timeout happen in the configured time.

Probably not a known issue. I think using the PIPEWAIT option should not interfere with timeout, so it's actually a bug of (lib)curl, not your usage of the library. I don't think the latest release (8.7.1) fixed this issue.

Unfortunately I have not yet found the root cause of the issue. But my test commands above using the curl command line tool should make it very easy to reproduce the problem. And I believe a fix for the command line tool probably will fix your problem as well.

@Smackd0wn
Copy link

After running git bisect, the issue is likely caused by commit 843b3ba by @bagder

Here is my understanding of the problem in the curl tool:

  1. Because HTTP connection may be multiplexed, a non-responding HTTP server will keep all new connections in a "may be multiplexed later" state.

    curl/lib/url.c

    Lines 962 to 969 in df3d5ef

    if(IsMultiplexingPossible(data, needle)) {
    if(bundle->multiuse == BUNDLE_UNKNOWN) {
    if(data->set.pipewait) {
    infof(data, "Server doesn't support multiplex yet, wait");
    *waitpipe = TRUE;
    CONNCACHE_UNLOCK(data);
    return FALSE; /* no reuse */
    }
  2. With pipewait on, Curl_connect() returns error CURLE_NO_CONNECTION_AVAILABLE.

    curl/lib/url.c

    Lines 3669 to 3722 in df3d5ef

    if(waitpipe)
    /* There is a connection that *might* become usable for multiplexing
    "soon", and we wait for that */
    connections_available = FALSE;
    else {
    /* this gets a lock on the conncache */
    struct connectbundle *bundle =
    Curl_conncache_find_bundle(data, conn, data->state.conn_cache);
    if(max_host_connections > 0 && bundle &&
    (bundle->num_connections >= max_host_connections)) {
    struct connectdata *conn_candidate;
    /* The bundle is full. Extract the oldest connection. */
    conn_candidate = Curl_conncache_extract_bundle(data, bundle);
    CONNCACHE_UNLOCK(data);
    if(conn_candidate)
    Curl_disconnect(data, conn_candidate, FALSE);
    else {
    infof(data, "No more connections allowed to host: %zu",
    max_host_connections);
    connections_available = FALSE;
    }
    }
    else
    CONNCACHE_UNLOCK(data);
    }
    if(connections_available &&
    (max_total_connections > 0) &&
    (Curl_conncache_size(data) >= max_total_connections)) {
    struct connectdata *conn_candidate;
    /* The cache is full. Let's see if we can kill a connection. */
    conn_candidate = Curl_conncache_extract_oldest(data);
    if(conn_candidate)
    Curl_disconnect(data, conn_candidate, FALSE);
    else {
    infof(data, "No connections available in cache");
    connections_available = FALSE;
    }
    }
    if(!connections_available) {
    infof(data, "No connections available.");
    conn_free(data, conn);
    *in_connect = NULL;
    result = CURLE_NO_CONNECTION_AVAILABLE;
    goto out;
    }
  3. Because of that error, the handle is set to the PENDING state.

    curl/lib/multi.c

    Lines 1930 to 1943 in df3d5ef

    result = Curl_connect(data, &async, &connected);
    if(CURLE_NO_CONNECTION_AVAILABLE == result) {
    /* There was no connection available. We will go to the pending
    state and wait for an available connection. */
    multistate(data, MSTATE_PENDING);
    /* add this handle to the list of connect-pending handles */
    Curl_llist_insert_next(&multi->pending, multi->pending.tail, data,
    &data->connect_queue);
    /* unlink from the main list */
    unlink_easy(multi, data);
    result = CURLE_OK;
    break;
    }
  4. The aforementioned commit (843b3ba) unlinks PENDING items from the main list. So that it won't be checked during the next run, until all pending handles are added back by process_pending_handles(), triggered by some events.

Prior to that commit, pending handles were not removed from the main list. If I comment out the following line, the bug goes away in my case.

curl/lib/multi.c

Lines 1939 to 1940 in df3d5ef

/* unlink from the main list */
unlink_easy(multi, data);

This change is between 8.0.1 and 8.1.0. So I am less certain it is the same issue encountered by @rahulKrishnaM who uses 7.84.0.

@rahulKrishnaM
Copy link
Author

rahulKrishnaM commented Apr 4, 2024

Hi @Smackd0wn, I tried moving to curl version 8.1.2 and applying the patch you mentioned above. I still see the same observation as in the ticket description. Also, reattempted the same disabling CURLOPT_PIPEWAIT, and I see timeouts happening at the configured time as expected.

@bagder bagder removed the needs-info label Apr 4, 2024
@bagder bagder changed the title curl requests to non-existent ips not returning in connect_timeout duration queued up transfers do not timeout correctly Apr 4, 2024
@bagder bagder self-assigned this Apr 4, 2024
bagder added a commit that referenced this issue Apr 4, 2024
When there is a "change" in a multi handle and pending handles are moved
back to the main list to be retested if they can proceed further (for
example a previous transfer completed or a connection has a confirmed
multiplexed state), the timeout check in multi_runsingle() would not
trigger because it required an established connection.

This could make a pending tranfer go back to pending state even though
it had been "in progress" for a longer time than permitted. By removing
the requirement for an associated connection, the timeout check will be
done proper even for transfers that has not yet been assigned one.

Fixes #13227
Reported-by: Rahul Krishna M
@bagder
Copy link
Member

bagder commented Apr 4, 2024

Can you check if #13276 makes anything better?

@Smackd0wn
Copy link

Can you check if #13276 makes anything better?

It caused segmentation fault in multi_done() (backtrace to multi_handle_timeout() and multi_runsingle()), probably because of the removal of data->conn check.

@rahulKrishnaM
Copy link
Author

Hi @bagder, I tried out the patch, but I think I still see the drift. And, after some time there was a crash, with bt as below (as @Smackd0wn has reported).

Program terminated with signal SIGSEGV, Segmentation fault.
#0  multi_done (data=data@entry=0x5c0ef1940000, status=status@entry=CURLE_OPERATION_TIMEDOUT, premature=premature@entry=true) at multi.c:684
684     multi.c: No such file or directory.
[Current thread is 1 (Thread 0x734999fd1a00 (LWP 30))]
(gdb) bt
#0  multi_done (data=data@entry=0x5c0ef1940000, status=status@entry=CURLE_OPERATION_TIMEDOUT, premature=premature@entry=true) at multi.c:684
#1  0x000073499d51d7ed in multi_handle_timeout (data=data@entry=0x5c0ef1940000, now=now@entry=0x7ffcbccdb440,
    stream_error=stream_error@entry=0x7ffcbccdb3e6, result=result@entry=0x7ffcbccdb3e8, connect_timeout=connect_timeout@entry=false) at multi.c:1670
#2  0x000073499d51db06 in multi_runsingle (multi=0x5c0ef17ee1c0, nowp=0x7ffcbccdb440, data=0x5c0ef1940000) at multi.c:1929
#3  0x000073499d51f794 in multi_socket (multi=multi@entry=0x5c0ef17ee1c0, checkall=checkall@entry=false, s=<optimized out>, ev_bitmask=<optimized out>,    running_handles=0x7ffcbccdb4dc) at multi.c:3258
#4  0x000073499d51f8bf in curl_multi_socket_action (multi=0x5c0ef17ee1c0, s=<optimized out>, ev_bitmask=<optimized out>,    running_handles=<optimized out>) at multi.c:3380
<---application timer processing in eventloop--->
(gdb) f 0
#0  multi_done (data=data@entry=0x5c0ef1940000, status=status@entry=CURLE_OPERATION_TIMEDOUT, premature=premature@entry=true) at multi.c:684
684     in multi.c
(gdb) p conn
$1 = (struct connectdata *) 0x0

But, there was evident drift in timeout before this crash happened.

@bagder
Copy link
Member

bagder commented Apr 4, 2024

there was a crash

I believe I've fixed that in an update

But, there was evident drift in timeout before this crash happened.

@Smackd0wn's curl -Z cmdline test went from 24 seconds to 6 seconds for me, so it does have some impact in my tests.

As we don't have a recipe for your case @rahulKrishnaM, it is harder for me to debug that.

@Smackd0wn
Copy link

there was a crash

I believe I've fixed that in an update

But, there was evident drift in timeout before this crash happened.

@Smackd0wn's curl -Z cmdline test went from 24 seconds to 6 seconds for me, so it does have some impact in my tests.

Thank you very much for the fix. Indeed it fixed the segfault and improved the timeout behaviour. Now timeout without any connections is mostly correct. However, when I increased the numbers and ran curl -vZm 1 'http://127.1:1234/[1-500]' for several times, the command sometimes took 12 or 11 seconds instead of 10. When it's 12 or 11, some requests log Operation timed out after 2000 milliseconds with 0 bytes received instead of ~1000 milliseconds.

@bagder
Copy link
Member

bagder commented Apr 5, 2024

I believe the reason for this is related to the optimization I did to fix #2369, which is not really optimal when where are many transfers pending due to PIPEWAIT... I have few ideas to work on.

@rahulKrishnaM
Copy link
Author

I'm yet to try out with the latest changes @bagder. However, when I said the drift was still there, I am trying out with a higher timeout so that the variation is evident. I tried out with 25 seconds as connection_timeout, where I could see the variations as:
25.0496, 24.0507, 48.1019, 72.1 before the program crashed.

This is the pattern of logs when timeouts happen.

"Failed to connect to 192.168.1.8 port 9001 after 25048 ms: Timeout was reached","connection":"http://192.168.1.8:9001/code/1","result":28}
"Failed to connect to 192.168.1.8 port 9001 after 0 ms: Operation was aborted by an application callback","connection":"http://192.168.1.8:9001/code/2","result":42}
"Failed to connect to 192.168.1.8 port 9001 after 25048 ms: Timeout was reached","connection":"http://192.168.1.8:9001/code/3","result":28}
"Connection timed out after 25000 milliseconds","connection":"http://192.168.1.8:9001/code/4","result":28}

@bagder bagder changed the title queued up transfers do not timeout correctly queued up PIPEWAIT transfers do not timeout correctly Apr 8, 2024
bagder added a commit that referenced this issue Apr 12, 2024
When there is a "change" in a multi handle and pending handles are moved
back to the main list to be retested if they can proceed further (for
example a previous transfer completed or a connection has a confirmed
multiplexed state), the timeout check in multi_runsingle() would not
trigger because it required an established connection.

This could make a pending tranfer go back to pending state even though
it had been "in progress" for a longer time than permitted. By removing
the requirement for an associated connection, the timeout check will be
done proper even for transfers that has not yet been assigned one.

Ref #13227
Reported-by: Rahul Krishna M
Closes #13276
bagder added a commit that referenced this issue Apr 15, 2024
When there is a "change" in a multi handle and pending handles are moved
back to the main list to be retested if they can proceed further (for
example a previous transfer completed or a connection has a confirmed
multiplexed state), the timeout check in multi_runsingle() would not
trigger because it required an established connection.

This could make a pending tranfer go back to pending state even though
it had been "in progress" for a longer time than permitted. By removing
the requirement for an associated connection, the timeout check will be
done proper even for transfers that has not yet been assigned one.

Ref #13227
Reported-by: Rahul Krishna M
Closes #13276
bagder added a commit that referenced this issue Apr 15, 2024
Since we can go to the CONNECT state from PENDING, potentially multiple
times for a single transfer, this change introdues a SETUP state that
happens before CONNECT when doing a new transfer.

Now, doing a redirect on a handle goes back to SETUP (not CONNECT like
before) and we initilize the connect timeout etc in SETUP. Previously,
we would do it in CONNECT but that would make it unreliable in cases
where a transfer goes in and out between CONNECT and PENDING multiple
times.

SETUP is transient, so the handle never actually stays in that state.

Additionally: take care of timeouts of PENDING transfers in
curl_multi_perform()

Ref: #13227
@bagder
Copy link
Member

bagder commented Apr 15, 2024

I merged #13276 as a first step towards a fix.

Now I offer #13371 as second step, which seems to take it further.

bagder added a commit that referenced this issue Apr 15, 2024
Since we can go to the CONNECT state from PENDING, potentially multiple
times for a single transfer, this change introdues a SETUP state that
happens before CONNECT when doing a new transfer.

Now, doing a redirect on a handle goes back to SETUP (not CONNECT like
before) and we initilize the connect timeout etc in SETUP. Previously,
we would do it in CONNECT but that would make it unreliable in cases
where a transfer goes in and out between CONNECT and PENDING multiple
times.

SETUP is transient, so the handle never actually stays in that state.

Additionally: take care of timeouts of PENDING transfers in
curl_multi_perform()

Ref: #13227
Closes #13371
@bagder
Copy link
Member

bagder commented Apr 15, 2024

@rahulKrishnaM please try again with current master and let us know if there is any difference

@rahulKrishnaM
Copy link
Author

Hi @bagder I need to check this. Will get back with the latest commit changes how my testing goes.

@bagder
Copy link
Member

bagder commented Apr 30, 2024

Presumed to be fixed.

@bagder bagder closed this as completed Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

3 participants