Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FTPS upload of large file (800 GB) using TLS 1.3 gets slower and slower after ~4.5h and 360 GB #13097

Open
YvesFoltys opened this issue Mar 11, 2024 · 9 comments

Comments

@YvesFoltys
Copy link

I did this

Uploaded a file of >800 GB to a server

# ls -l testfile
-rw-r-----    1 root     swsupt   887270539264 Nov 19 07:50 testfile

# curl -T testfile --netrc --insecure --ssl-reqd ftp://1.2.3.4//targetdir/testfile

The upload started as expected but after ~4.5h and ~360 GB the transferrate drops from ~23 MB/s to 500 KB/s and keeps getting slower (~200 KB/s after 5h).

Limiting the TLS version to 1.2, the upload completes in ~10 h and constant 23 MB/s
# curl --tlsv1.2 --tls-max 1.2 -T testfile --netrc --insecure --ssl-reqd ftp://1.2.3.4//targetdir/testfile

Using plain ftps, the upload also works fine in ~10 h and constant 23 MB/s
ftp -s 1.2.3.4

The server runs on IBM i 7.3.0 410 . According to the IBM i support, this issue may be caused by the Curl FTP client not responding to TLSv1.3 rekey requests.
Looking through the known issues, it may also be related to https://curl.se/docs/knownbugs.html#FTPS_upload_data_loss_with_TLS_1
I can't say what happens if the slow upload completes since it would take ages to let it run till completion.

I expected the following

The upload should complete without loss in speed using TLS 1.3

curl/libcurl version

curl -V

curl 8.5.0 (powerpc-ibm-aix7.1.5.0) libcurl/8.5.0 OpenSSL/1.1.1v zlib/1.2.13 libssh2/1.10.0 nghttp2/1.58.0 OpenLDAP/2.5.16
Release-Date: 2023-12-06
Protocols: dict file ftp ftps gopher gophers http https imap imaps ldap ldaps mqtt pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS GSS-API HSTS HTTP2 HTTPS-proxy IPv6 Kerberos Largefile libz NTLM SPNEGO SSL threadsafe UnixSockets

operating system

AIX 7200-05-07-2346

@icing
Copy link
Contributor

icing commented Mar 11, 2024

Do you have a server we can test this against?

@bagder
Copy link
Member

bagder commented Mar 11, 2024

this issue may be caused by the Curl FTP client not responding to TLSv1.3 rekey requests

Unlikely. Such a connection would be closed. But also, why would it not handle rekeying? If that is even used here.

Also: is there anything that argues against this instead being an issue in the server end?

@YvesFoltys
Copy link
Author

Do you have a server we can test this against?

Unfortunately I don't have a public server

Also: is there anything that argues against this instead being an issue in the server end?

I have to admit, that "ftp -s" working was proof to me that the server side works. Just started a new session and captured an iptrace to find that "ftp -s" uses TLS v1.2, which also works for curl...
Will try to find another way to test using TLS v1.3 without curl being involved to check if that works.

@YvesFoltys
Copy link
Author

Is this strictly internal / on a LAN? Can you be 100% confident that it's not the hosting company / ISP / bandwidth provider bottlenecking for fear of a DDoS attack or something?

Yes, this is internal only. And yes, since we manage the infrastructure as well and the same file between the same systems works with TLS v1.2, I'm certain that no switch, router or firewall causes the problems

@YvesFoltys
Copy link
Author

I run several tests from different systems targeting the same server

OS       | Tool        | TLS | result
---------------------------------------------------------------------
AIX      | curl 7.61.1 | 1.3 | slows down
AIX      | curl 7.61.1 | 1.2 | works
AIX      | curl 8.6.0  | 1.3 | slows down
AIX      | ftps        | 1.2 | works
RHEL 8.9 | curl 7.61.1 | 1.3 | slows down
RHEL 8.9 | lftp        | 1.3 | works
RHEL 9.3 | curl 7.76.1 | 1.3 | aborts with "Connection reset by peer, errno 104"
RHEL 9.3 | curl 8.6.0  | 1.3 | slows down

So in my perception, the general upload using FTPS and TLS 1.3 works fine. Therefore I would rule out a server side issue.
The first result on RHEL 9.3 (Connection reset by peer) I found interesting. I tested the upload 3 times and the error always occured at 362 GB after ~270 min (+/- 2 min). That is pretty much the point in time where the other curl uploads using TLS 1.3 started to slow down. I don't know why that error occured, but only updating curl changed the result to the upload slowing down again.
It seems to me as if the same issue occured but was handled differently by the different version.
Hopefully that helps in pinning down the problem. I'm currently out of ideas what else to test.

@BrianInglis
Copy link
Contributor

Could you run the test the other way round - to any of the other systems?

@YvesFoltys
Copy link
Author

Unfortunately I don't have the possibility to run it the other way around. But I tested again with AIX and curl 8.6 as client against another server running a more recent IBM i version (7.5 vs. 7.3). Although the upload speed overall was better due to other infrastructure between the systems, the transfer rate again dropped after 362 GB:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 44  826G    0     0   44  364G      0   109M  2:08:50  0:56:45  1:12:05 2386k

So this seems to be related to the amount of data rather than the time.

@bagder
Copy link
Member

bagder commented Apr 12, 2024

Obviously there is no code or condition anywhere in curl that does anything different after some specific amount of bytes transferred or time spent.

My wild guess is that something is done on the TLS layer after some specific time and that triggers a different code path or something in OpenSSL that makes it run slower.

It would be interesting to know if curl built with another TLS library or even a current OpenSSL version would behave differently.

@BrianInglis
Copy link
Contributor

Could this be due to TLS session key renegotiation after some byte or time limit?
This could depend on the underlying stacks.
Are both ends running the same TLS/SSL stacks, what versions are those stacks, and what are those TLS session key renegotiation byte or time limits?
May be time to look at a run with -v, --verbose or some --trace... option(s) that do not log 400GB+ data!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants