-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for large, multi-part config messages #11
base: dev
Are you sure you want to change the base?
Conversation
LGTM. I made a fix for the clang-16 build issues at #15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it looks like this PR doesn't play nicely with the libQuic
changes due to the updated oxen-encoding
version (oxenc::bt_list_consumer btlc(data);
is no longer valid) - it seems we might need to wait until oxen-io/oxen-libquic#152 has been merged?
I worked around this for my testing by modifying btstream.cpp
to do this instead:
auto val = oxenc::const_span<std::byte>{req.data(), req.size()};
oxenc::bt_list_consumer btlc(val);
but that resulted in Cannot create a bt_list_consumer with no data
errors in the longs (and then I wasn't able to continue testing due to the config_push
assertion but I assume I would have been prevented from testing properly due to this hack)
- Update oxen-encoding to latest dev for new span support in bt code. - bring in oxen-logging so that we can use fmt; for now this only for fmt in the test suite, but we will definitely start using it in the future in libsession-util (the libquic PR already does). - added an overload of the test suite "printable" functions that takes format+args. - replace some `"..."_hex` string concatenations that broke when oxenc switched to constexpr _hex no longer return strings. - Fixed some oxen-related test code that broke where it was touching oxenc::detail functions.
This implements multi-part config messages, which have been stubbed in for a while but still needed an implementation. Such messages are split after post-compression, post-padding, but before encryption (i.e. each piece gets encrypted separately), and all but the last piece will be a maximum size message (76800 bytes). The Config object itself keeps track of partially received messages, joining them up as they arrive and processing them once it has a full set (or else expiring them if they are too old). It also tracks (without storing) recently fully processed multipart messages so that they can be discarded without pointlessly storing them if they happen to arrive again for some reason. All of this requires some (breaking) external API changes to implement: - `merge(...)` now returns an ordered_set of config messages that failed to merge rather than a vector; we couldn't keep track of the order anymore, because of the next point. - `merge(...)` now returns *all* the hashes that make up a multipart set when one gets merged, even if they weren't passed in in the current merge call. For instance if you have a three-chunk multipart config and merge parts 1 and 2, but not 3, then 1 and 2 will be preserved internally. When you later provide 3, then all of hash1, hash2, and hash3 get returned to indicate that all of those were processed as a part of this merge. - confirm_pushed() previously took a vector and implied the order matters (although we never previously returned more than one). It now takes an unordered_set to be consistent with merge() and current_hashes(), and because the order is irrelevant. - The C version of confirm_pushed (config_confirm_pushed) previously only accepted a single message hash + size (and would convert this into a one-element vector for confirm_pushed()). Its API changes quite considerably as it now has to pass an array of hashes and sizes and the size of those arrays. As with the C++ version, the order of elements here is unimportant. - The C++ `current_hashes()` previously only ever returned a 0- or 1-length vector; it now returns an unordered_set, and can easily return more than one (i.e. when the current config is a multipart config message). - Added a new `active_hashes()` method that returns the current hashes *plus* the hashes of any partially loaded multi-part configs for which we are awaiting remaining parts. When used for "do we want to keep this alive?" this is the more appropriate method to use compared to `current_hashes()`. Other multi-part related changes (which don't affect the API, but are worth noting): - multipart messages *do not* support (and must not be) protobuf wrapped, because multipart messages *already* will not be understood by clients not using this new code, and so anything supporting multipart should also have been fixed to not apply extra protobuf wrapping. We also are returning maximum size message parts, and the protobuf overhead will push it over the storage limit in a way that isn't trivial for libsession-util to determine and compensate for. Other changes not directly related to multipart, but included as part of this: - Started converting some use of ustring_view to `span<const unsigned char>` (alias: `ucspan`), adding some helper functions and changing some functions to take spans instead. For the most part this conversion can happen piecemeal as, luckily, ustring_views are implicitly convertible to `ucspan`'s, and so most places passing a ustring_view (or ustring) will not break. (The other way, i.e. ucspan to ustring_view, does not work, and so the conversion needs to happen from innermost functions out; this initial change is focussed on some of those inner functions). - fixed a bug where a duplicate config (multipart or otherwise) that gets merge()d would result in that config's hash ending up in *both* current hashes and old hashes, but it should not have gone into old hashes. - started converting some string concatenation and std::to_string usage to fmt and/or "..."_format instead now that we have those available (via oxen-logging).
_curr_hash = all_hashes[_config->unmerged_index()]; | ||
_curr_hashes.clear(); | ||
auto& hashes = all_hashes[_config->unmerged_index()]; | ||
_curr_hashes.insert(hashes.begin(), hashes.end()); | ||
} | ||
} else { | ||
// the merging affect nothing (if it had seqno would have been incremented), so don't | ||
// pointlessly replace the inner config object. | ||
assert(new_conf->unmerged_index() == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering whether this assert should be removed - just thinking that there are situations where a merge can occur with a config which is not going to result in a change
Eg. if you link a new device and, due to network conditions, it wasn't able to retrieve the existing configs then it would push up it's "new user" config, which would then be pulled down on your existing device, merged, and likely result in no changes right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I just realised that at the moment we consistently merge in the current config because when you push up a new config message we don't update the lastHash
value (so we don't unintentionally miss a config that was pushed up by another device at the same time)
Which means on the poll after pushing the current config we will always receive the current config back and try to merge it 🙃
This implements multi-part config messages, which have been stubbed in for a while but still needed an implementation.
Such messages are split after post-compression, post-padding, but before encryption (i.e. each piece gets encrypted separately), and all but the last piece will be a maximum size message (76800 bytes).
The Config object itself keeps track of partially received messages, joining them up as they arrive and processing them once it has a full set (or else expiring them if they are too old). It also tracks (without storing) recently fully processed multipart messages so that they can be discarded without pointlessly storing them if they happen to arrive again for some reason.
All of this requires some (breaking) external API changes to implement:
merge(...)
now returns an ordered_set of config messages that failed to merge rather than a vector; we couldn't keep track of the order anymore, because of the next point.merge(...)
now returns all the hashes that make up a multipart set when one gets merged, even if they weren't passed in in the current merge call. For instance if you have a three-chunk multipart config and merge parts 1 and 2, but not 3, then 1 and 2 will be preserved internally. When you later provide 3, then all of hash1, hash2, and hash3 get returned to indicate that all of those were processed as a part of this merge.current_hashes()
previously only ever returned a 0- or 1-length vector; it now returns an unordered_set, and can easily return more than one (i.e. when the current config is a multipart config message).active_hashes()
method that returns the current hashes plus the hashes of any partially loaded multi-part configs for which we are awaiting remaining parts. When used for "do we want to keep this alive?" this is the more appropriate method to use compared tocurrent_hashes()
.Other multi-part related changes (which don't affect the API, but are worth noting):
Other changes not directly related to multipart, but included as part of this:
Started converting some use of ustring_view to
span<const unsigned char>
(alias:ucspan
), adding some helper functions and changing some functions to take spans instead. For the most part this conversion can happen piecemeal as, luckily, ustring_views are implicitly convertible toucspan
's, and so most places passing a ustring_view (or ustring) will not break. (The other way, i.e. ucspan to ustring_view, does not work, and so the conversion needs to happen from innermost functions out; this initial change is focussed on some of those inner functions).fixed a bug where a duplicate config (multipart or otherwise) that gets merge()d would result in that config's hash ending up in both current hashes and old hashes, but it should not have gone into old hashes.
started converting some string concatenation and std::to_string usage to fmt and/or "..."_format instead now that we have those available (via oxen-logging).