Use multithreaded zstd compression #8217

Forza-tng · 2024-05-12T19:20:40Z

Have you checked borgbackup docs, FAQ, and open GitHub issues?

YES

Is this a BUG / ISSUE report or a QUESTION?

ISSUE / QUESTION

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

Borg 1.2.8

Operating system (distribution) and version.

Gentoo Linux

Hardware / network configuration, and filesystems used.

amd64, btrfs, fiber 1gbit/s. Remote storage via ssh.

How much data is handled by borg?

~1TB

Full borg commandline that lead to the problem (leave away excludes and passwords)

borg create --compression zstd,10

Describe the problem you're observing.

When using too high compression level, the Borg process gets pinned at 100% CPU of one core.

zstd supports multithreading which can greatly improve its performance. zstd -T1 shows roughly the same bandwidth i see with Borg, which leads me to believe that MT option is not enabled.

https://pyzstd.readthedocs.io/en/latest/#mt-compression

Another improvement, if not already used, is to use the --long option. It allows zstd to use a bigger window for higher gains.

The text was updated successfully, but these errors were encountered:

ThomasWaldmann · 2024-05-12T19:43:53Z

Multithreading is planned for after borg 2.

The problem with approaches like the compressor internally implementing multithreading is that borg has a chunk size of typically 2MiB (and that is only effective if the file is larger than that, but a lot of files are smaller). This already relatively small chunk then has to get split into e.g. 4 pieces (making it even much smaller), 4 threads have to get started and terminated. So there is a lot of overhead and only a little of data to get compressed per thread.

Considering that this is only needed for new data and a lot of data usually doesn't change (and thus doesn't need compression), it is usually not a big concern for the 2nd+ backup. Exceptions are users with a ton of new data each day and also first time backups.

A better approach is to have borg implement multithreading (and pipelining), so that the chunks don't need to get split further into smaller pieces. But that is a very big change and other big changes (see master branch) have higher prio.

ThomasWaldmann · 2024-05-12T19:56:23Z

There is also #37 and #3500 for more details.

Forza-tng · 2024-05-12T21:12:28Z

There is also #37 and #3500 for more details.

Thanks. Those to mainly talking about multithreading of Borg itself, while I was only considering enable the multithread flag to the zstd library used. This should not need much changes as it is more or less a variable to set.

Edit: Didn't see your first message. Full multi threading in borg could definitely improve things, but it Is a much larger task, so it's understandable not a priority for Borg 1.x.

ThomasWaldmann · 2024-05-13T10:15:31Z

@Forza-tng I somehow suspect (for reasons I stated in my post above), that just giving that flag to zstd, isn't improving things much.

Can you do a practical experiment, implementing that change and comparing performance?

Also, we don't use pyzstd, but we directly call the libzstd api via Cython, see compress.pyx.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use multithreaded zstd compression #8217

Use multithreaded zstd compression #8217

Forza-tng commented May 12, 2024

ThomasWaldmann commented May 12, 2024 •

edited

ThomasWaldmann commented May 12, 2024

Forza-tng commented May 12, 2024 •

edited

ThomasWaldmann commented May 13, 2024

Use multithreaded zstd compression #8217

Use multithreaded zstd compression #8217

Comments

Forza-tng commented May 12, 2024

Have you checked borgbackup docs, FAQ, and open GitHub issues?

Is this a BUG / ISSUE report or a QUESTION?

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

Operating system (distribution) and version.

Hardware / network configuration, and filesystems used.

How much data is handled by borg?

Full borg commandline that lead to the problem (leave away excludes and passwords)

Describe the problem you're observing.

ThomasWaldmann commented May 12, 2024 • edited

ThomasWaldmann commented May 12, 2024

Forza-tng commented May 12, 2024 • edited

ThomasWaldmann commented May 13, 2024

ThomasWaldmann commented May 12, 2024 •

edited

Forza-tng commented May 12, 2024 •

edited