Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add file concat option #1208

Closed
daniel-lucio opened this issue May 2, 2024 · 4 comments
Closed

Add file concat option #1208

daniel-lucio opened this issue May 2, 2024 · 4 comments

Comments

@daniel-lucio
Copy link

Description

The same way there is an api/bus/objects/copy endpoint, it would be wonderful if we have api/bus/objects/concatenate or add the copy endpoint the support to concatenate files.
The current copy endpoint needs this:

{
    "sourceBucket": "default",
    "sourcePath": "/sample.jpeg",
    "destinationBucket": "default",
    "destinationPath": "/samples/sample.jpeg"
}

maybe add support to support things like:

{
    "sourceBucket": "default",
    "sourcePath": ["/1.txt", "/2.txt", "/3.txt", "/4.txt",......]
    "destinationBucket": "default",
    "destinationPath": "/samples/concat.txt"
}

Version

v1.0.6

What operating system are you running (e.g. Ubuntu 22.04, macOS, Windows 11)?

Linux

Anything else?

No response

@ChrisSchinnerl
Copy link
Member

What is the use-case you had in mind for this? It seems odd to me that you would want to combine independent files into a single one unless it is for uploading large files in chunks which you can achieve through the multipart upload API.

Keep in mind that combining files wouldn't mean reuploading them. It would just merge metadata.

@daniel-lucio
Copy link
Author

Yes, it has to do with uploading huge files. I have to play around with the restrictions that libfuse gives me with yours (renterd). Merging metadata to get a bigger file is what I am looking for.

Some context here.

In #1165, the answer was Range support for PUT wouldn't happen. Then, I can't upload a file longer than 128K as libfuse (3, 2 is even smaller 4k) only pushes 128kb blocks of data. Also, please note that libfuse write operations are atomic (meaning, I can't know the total size of the write op, I only have data, offset and size).

Then I moved to multiparts. Multiparts have a 10000-part restriction and I was told on Discord that increasing that limit won't happen. This puts siafs to a file limit of 1.2 GB (128kb times 10,000) which is not a small file but it is very common for a file (such as a video or a high-quality image) to be longer than 1.2 GB.

Then I moved to a buffered approach in which I saved locally the first 1 GB before pushing it into a multipart. This approach would raise the file limit to 1 TB. But sadly, renterd has the following issues when uploading a big multipart (as per my tests, a 76 MB multipart presents the following error)

*   Trying 192.168.7.52:9880...
* Connected to 192.168.7.52 (192.168.7.52) port 9880 (#0)
* Server auth using Basic with user ''
> PUT /api/worker/multipart%2Fn1.txt?bucket=default&uploadid=2e942a020f25cb11630e76a9be756f1ad4b24dab94a5ffcd85e531945690ec45&partnumber=1 HTTP/1.1
Host: 192.168.7.52:9880
Authorization: Basic OmYxbDNtMG4zbGdyNG5kMw==
Accept: */*
Content-Length: 78888897
Content-Type: application/octet-stream
Expect: 100-continue

< HTTP/1.1 100 Continue
* We are completely uploaded and fine
< HTTP/1.1 500 Internal Server Error
< Content-Type: text/plain; charset=utf-8
< X-Content-Type-Options: nosniff
< Date: Thu, 02 May 2024 00:47:41 GMT
< Transfer-Encoding: chunked
< 
couldn't upload object: failed to upload slab: launched=14 uploaded=1 remaining=5 inflight=0 pending=5 uploaders=14 errors=11 
95770d3b: failed to upload sector to contract fcid:96ad01c97e4089c86d420e2c40b8b8ea36f2deae51925177f489a21fd1ab5f5d, err: AppendSector: WriteResponse: stream was gracefully closed
..... (many lines of the same kind of error)
5bf06361: failed to upload sector to contract fcid:6446923cd00bbba770688246af1ab11f06623df2bf7780ebf146c0d79016c559, err: AppendSector: WriteResponse: stream was gracefully closed
04e326a6: failed to upload sector to contract fcid:f5f0a4924bdc2023cb2794dceeb0d6717f39a9b6e476dc0ccf78bdf55ee0be55, err: AppendSector: WriteResponse: stream was gracefully closed
9d0dc5d8: failed to upload sector to contract  to the fact fcid:a979b5eaa6cc9fda6a98355adf57507412175d958760d2585e6925470f8ba27a, err: AppendSector: WriteResponse: stream was gracefully closed
* Connection #0 to host 192.168.7.52 left intact
Segmentation fault (core dumped)

Please note the Content-Length: 78888897 (76 MB), the file size is quite common. During my tests, a 6 MB multipart had no issue, 6 MB times 10,000, is around 60 GB.

So, I was looking at different ways to upload huge files and, since copying smaller files is not an issue, then my approach would be uploading the small chunks into a hidden directory (.something) and when libfuse reported the write op ended, I'd concatenate them on the server size and copy with the proper endpoint.

I am open to suggestions.

@ChrisSchinnerl
Copy link
Member

Are you running this on testnet or mainnet? Uploading 76MB upload parts shouldn't be an issue at all. When using rclone we actually recommend setting --s3-chunk-size to 120MiB which results in 120MiB parts getting uploaded.
My personal node is also using that size for uploads just fine on mainnet.

To me it seems like this might be more of an issue with your configuration / testnet hosts rather than the size of the parts. So I wouldn't discard the buffered approach just yet. There is a good chance the smaller parts worked because they weren't actually uploaded and instead buffered for upload packing.

@daniel-lucio
Copy link
Author

I am on testnet.

I use default settings from renterd.

@ChrisSchinnerl ChrisSchinnerl closed this as not planned Won't fix, can't repro, duplicate, stale Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

2 participants