-
Notifications
You must be signed in to change notification settings - Fork 417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Article: options and tradeoffs around data import parameters #1176
Comments
Hey @lidel ,feel free to assign me. Got time tomorrow for that. :) |
@lidel wrote:
I may need some input here. I actually can't think of a reasonable explanation why size-based is better than a rolling chunker. Maybe someone like @Stebalien can chime in here and tell me why the decision was made to use a size-based chunker by default. :) |
Correct me when I'm wrong, but it's just a little bit less overhead for data which is read from front to back anyway. So any file type with random access will be slowed down. Logs are not large enough to make any significant difference here, as you can easily fit a list of all chunks of a log in one block. So while one may think of zip-like archives, iso files or videos, that's also actually not the case. Zip files are random access and iso files can be mounted without reading the full iso as a whole, and video streaming with seeking is pretty much the norm. I also cannot think of a really good usecase here - so I would flag it as "stable, but experimental" option. |
I feel like I may not be the right person after all to write this article :D I wrote a ticket to change this default actually – and I still think blake2b is the better default. :) So I guess "standards?" Or "legacy stuff we not dare to change?" |
So overall, just the "why?" and rationale is the blocker for me to write it. As, I have the opinion that these should be the standards – and don't see good reason to use anything else. :)
And I use them everywhere. So @lidel if you could just give some rationale for the whys (doesn't even need to be full sentences) I'm happy to write it. Just stop me if it gets too detailed ;) |
@RubenKelevra no need to write the whole thing, it is perfectly fine if you only write sections that you care about (even if it is only chunker) and open a PR draft with that, we will fill the gaps :) You are right, many choices like default chunker are legacy decisions – just write that and note that different implementations of IPFS are free to choose different defaults (e.g. blake2b). Totally, will be useful to even give some "Recipes" like the one you listed with blake and buzzhash, and elaborate why one would prefer that over the "safe"/legacy defaults. :) |
Alright. :)
Maybe we should just add a "--use-legacy-defaults" to the daemon (and as global flags for all commands) as a flag to free us up from those considerations that people rely on them. This would also free us up for the long discussed default ports for example, which we also not dare to change for similar reasons. :) This way we can document the "legacy defaults" once and why they were chosen and then elaborate why the new defaults are better. I feel that would make more sense when reading - and also more sense when using ipfs. |
@lidel triaging old issues, would you say this is still relevant? |
@ElPaisano yes, I believe that this is untapped potential in IPFS ecosystem, and having some introductory docs might empower people to innovate in this area. There is need for two articles (or one with two sections):
The goal would be to convey that chunking details are userland feature: anyone can use default chunking or roll their own. |
Most people are ok with whatever chunker and hash function is the current default in commands that import data to IPFS.
In case of go-ipfs, these are
ipfs add
,ipfs dag put
, andipfs block put
.However, one can not only use custom
--chunker
and--hash
function when doingipfs add
, but also choose to produce TrickleDAG instead of MErkleDAG by passing--trickle
, enable or disable--raw-leaves
, or even write own software that chunks and hashes and assembles UnixFS DAG in novel ways.One can go beyond that, and import a JSON data as dag-json or dag-cbor, creating data structures beyond regular files and directories.
We need an article that explains:
--trickle
better suited for append-only data such as logs?)Prior art:
--help
explainer around different chunkers Expand rolling chunker documentation kubo#8952The text was updated successfully, but these errors were encountered: