Release Checklist #229

dtaht · 2023-01-26T23:34:31Z

Checklist items on code passes

Testing

Writing

Outreach plan

of cargo audit and cargo outdated. Part of ISSUE #229

Part of ISSUE #229

…hase.

…ode in subdirectories!

thebracket · 2023-02-01T19:44:02Z

Rust dependencies all updated to latest, tweaks made to code where necessary. The GitHub CI now checks for CVEs and obsolete packages as part of the continuous integration run.

… underflows.

interduo · 2023-02-14T16:11:35Z

In my opinion alpha version is ready to release (as we use current master branch state in production and it really do well work now). This issue should be renamed to "Beta release checklist" after alpha comes out.

It's better to release alpha early and often next versions than doing really big milestones. What do You think?

dtaht · 2023-02-14T17:12:32Z

Mentally I have targetted an alpha release for the end of the month. I would like the vast majority of this checklist to have gone through by then. Also I have an increasing desire to move stuff out of Ispconfig.py and into the toml, where it would share the configuration for the bridge with the rust, and the setup simplified more for new users. Lastly, I would like to find and on-board at least two new users to find the things that those with experience with the product aren't finding before declaring that state. In the latter two cases, it is not the existing users' I care so much about, but the costs of supporting and on-boarding the next 100, or 1000, and everything we can do to improve usability before the alpha or final release, with the resources available will pay off down the line.

I am glad that the code is considered stable enough to be in production.

I have mostly been focusing on the math (which has some problems), a decent sim of real RTTs, and the netlink/sampling problems, none of which are barriers to the alpha. Some of the items on the checklist, look easy, like coping with licensing issues and verifying the python is up to date: @rchac ?

As always I seek consensus on all we do or plan, and we have a meeting this thursday 1PM PST to discuss the remaining 31 open issues here: https://github.com/LibreQoE/LibreQoS/issues?q=is%3Aopen+is%3Aissue+milestone%3Av1.4 which we can punt, modify, or fix.

to a proper atomic bool, for a completely unnoticable performance improvement.

thebracket · 2023-02-14T22:30:19Z

I did a global cargo fmt run, so the Rust side is formatted consistently (yes, it recurses).

I've moved a couple of trivial locks to atomics, for a not-really-measurable performance change (an uncontested mutex lock is approximately 13 nanoseconds in userspace on an 8 core AMD Fx at 3.6 ghz; that's VERY hard to beat).

I've abandoned the effort to use lock-free structures because they are consistently outperformed by locks in the benchmarking I performed. An RwLock wrapped update of the TC queue statistics structure was consistently faster than a similar update in a lock-free structure (tested with DashMap and Crossbeam's SkipMap - the latter has horrible usage semantics). I saw a similar lack of improvement for unlocking per-host throughput data.

A few minutes ago an advisory hit about Rocket; I'll update when the fix exists. It's pretty trivial and doesn't seem to make us vulnerable to anything. The audit system alerted me to it.

thebracket · 2023-03-23T18:07:26Z

I've put up a PR that checks for GPL3 in the Rust side of things. There isn't any. I haven't looked at the Python side. (PR #292 )

thebracket · 2023-03-23T18:26:34Z

For the other items:

"Format the code consistently" - cargo fmt will do that for you at any time. Yes its recursive.
"System error checking"
- "All ? squashed in Rust" - error messages are in pretty good shape. The ? operator has its place and is used decently.
- Checking for EINTR, EBUSY, ENOACCESS, ENOBUFS, E_WHY_DOESNT_C_HAVE_GOOD_ERROR_HANDLING and so on? Nope. Not even going there. You get an error message out of Rust functions that failed, indicating an error code. Sitting around adding manual checking for every possible error code is not a good use of anyone's time when there aren't issue reports relating to something failing that needs an error path.
Running strace . Well, I guess if it makes you happy have fun with that. Be sure to Google normal behavior. EAGAIN, for example, is part of normal futex operation...
log_once - not on my radar. I hate that macro. You introduce state into a simple operation, and now you're clogging up RAM storing previous log entries.
Math pass. We've found some NaNs and similar in regular debug cycles and fixed them.
"Does it make sense to use f32 or f64 anywhere?" What a strange question. There's at least 28 places where we use f32 in the Rust code base. What specifically isn't working?
"SIMD pass". There's SIMD code generated by the compiler all over the place (in 64-bit mode, anyway). Are there any specific hotspots you feel need optimized? Those would be good topics for bug reports.
"Structure size pass" - I've no idea what you're talking about. The biggest wasted space are the Cake stats we don't read.
"Heap table sizes and allocations/reallocations". Again, give me a specific pain point, including how it is negatively affecting performance, and I'll look at it.
"tracking locks" - the vast majority of frequently updated tables are lock free, and everything else uses RAII to ensure acquire/release are guaranteed.
"Eliminate unnecessary string parsing and flinging around". The bulk of the string parsing is in the queue stats. Not parsing fields we don't care about might help while queues are being watched. Again, though - to dedicate time to this you're going to have to find a case in which performance is adversely affected.
"Eliminating O(n) behaviors. O(log(n)) (at least) should always be a goal. Test 10x the actual size of the typical dataset today (e.g. 100k customers)". Duh. You're not the only one who has studied algorithm theory, including the part you didn't mention that even O(n^2) is fine if you don't do it often and there's a low n. Again, present cases of actual performance problems and we can look at the algorithms that are in use. (And bear in mind that the eBPF code is the "hot path" - runs per packet)

So I've done the parts of the "Checklist items on code passes" that make sense. The rest read more like an abstract guide on code creation, minus the parts about not optimizing things that don't matter.

dtaht · 2023-03-25T19:04:40Z

I appreciate all your pithy comments. Try to remember that we will one day on board devs far less experienced than us, and rather than hanging over every line of commit, I like to have deeply embedded the basic checklist items such as these. Someday, perhaps, there will be more.

Pieces of feedback on your feedback:

Asking that the checklist be checked off, is something that has to happen on every major release. It is an assurance, to those not deep in the code, that the dev has and is sure those things have been dealt with.
"Does it make sense to use double or floats", is related to the loss of precision that might happen if those are used, so I was asking essentially, did anything need to be a double, based on the data we were aggregating? We presently get data that has a dynamic range of billions (nsec to sec), which is outside the precision of normal floats.
strace on the whole application(s), lets us see what system calls are used and make doubly, extra, super sure, that all possible error returns are successfully coped with.

2.1) EAGAIN, EBUSY, etc can and will bite you on high speed interactions with the kernel. I am merely going to wait until it bites you as hard as it has bitten me, before stressing on this point anymore.

We have not actually tested a real workload any greater than what is deployed in the field. This is hung up on me constructing a decent enough sim.
In C, dealing with many heap allocs and frees, is always begging for trouble. It is always faster to parse a string down to a real value.

I agree, that presently, we are collecting and keeping around too much useless data. Also, ideally we move away from parsing system tool output into more directly programming the kernel.

My principal use for log_once is inside of large loops that might throw a ton of errors, which will permute the concurrency of other operations. More than once in this process, spamming the log has caused other problems.
Simd is merely a look-at thing, in terms of structuring data and code so that it could be parallized if necessary.

Aside from that, we can work on coding guidelines and other items to try and

Very happy to see you take the time to review this checklist, and express your feelings about it!

dtaht · 2023-03-25T19:09:05Z

Moving to two items I did not check off:

It is nice to have a grip on structure sizes. Traditionally I tried to construct something that took every structure we had and showed the size of it. Even more so, it is nice to have a grip on possible memory leaks and the why of their growth patterns, and from what they may be coming from.

As for tracking heap allocations, well, you just ran into that problem in the chrome bug you have been coping with with. When you have a program that needs to run without leaks, for months at a time, even the smallest leak, will bite you.

Both of these are just nice to haves at this point.

thebracket · 2023-03-25T19:20:31Z

One of the reasons I adopted Jem allocator is that it has a lot of tooling built in. I'll see about grabbing some output from it, eg https://gist.github.com/ordian/928dc2bd45022cddd547528f64db9174 Strings are mostly a problem at the C boundaries in Rust. Once in a Rust string, it's a smart pointer that stores length (null termination is awful, Wirth solved it in the 70s...). It deallocates as soon as it goes out of scope (same mechanism as a C++ destructor) - and dangling pointers, use after free literally won't compile. So they are a performance concern, but not a safety issue anymore. (Memory leaks are not part of Rust's safety guarantee, but you really have to work to make them by accident)

…

On Sat, Mar 25, 2023, 2:09 PM Dave Täht ***@***.***> wrote: Moving to two items I did not check off: It is nice to have a grip on structure sizes. Traditionaly I tried to construct something that took every structure we had and showed the size of it. Even more so, it is nice to have a grip on possible memory leaks and the why of their growth patterns, and from what they may be coming from. As for tracking heap allocations, well, you just ran into that problem in the chrome bug you have been coping with with. When you have a program that needs to run without leaks, for months at a time, even the smallest leak, will bite you. — Reply to this email directly, view it on GitHub <#229 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADRU437EY3N3STJ3QXMRAPTW547FXANCNFSM6AAAAAAUICTRO4> . You are receiving this because you commented.Message ID: ***@***.***>

interduo · 2024-02-06T17:15:23Z

What is release checklist for v1.5 and what is "the date"?:)

thebracket · 2024-02-06T18:09:41Z

When it's done! We don't have a formal list at this time. I hope to include (not complete):

Unified configuration (no separate /etc/lqos.conf and ispConfig.py in separate places). There's a PR for that, so its almost done.
Per-flow accounting for RTT, the ability to ignore hosts, ignore flows below a certain traffic level, ASN-level analysis.
Some improvements under the hood for how the eBPF runs/communicates with userspace.
(Hopefully) the "virtual node" thing, where you can have a hierarchy in network.json but mark nodes "virtual" - so they don't build a hierarchical shaper node - but ARE counted for stats. So you could mirror your actual topology and see where the traffic and delays are without having to cram things together.
Hopefully some improvements to binpacking.

interduo · 2024-05-22T19:13:15Z

What is the blocker for releasing v1.5rc1?

thebracket · 2024-05-23T15:06:55Z

Several merges, some testing and a handful of minor bugs in the configuration system. That, and we have day jobs!

interduo · 2024-05-30T17:46:24Z

Several merges, some testing and a handful of minor bugs in the configuration system. That, and we have day jobs!

Deb package build for develop branch are available (daily/ondemand/commit)?

dtaht added this to the v1.4 milestone Jan 26, 2023

thebracket added a commit that referenced this issue Feb 1, 2023

Update Tokio and all dependency versions based on local run

d198c0f

of cargo audit and cargo outdated. Part of ISSUE #229

thebracket added a commit that referenced this issue Feb 1, 2023

Add "cargo outdated" checks to the GitHub CI workflow.

3991aa4

Part of ISSUE #229

thebracket added a commit that referenced this issue Feb 1, 2023

Finish Rust package update cycle per ISSUE #229

38a2a78

thebracket added a commit that referenced this issue Feb 1, 2023

ISSUE #229 - Add CVE database check to cargo lock scanning in audit p…

568d55f

…hase.

thebracket added a commit that referenced this issue Feb 1, 2023

ISSUE #229 - YAML really wants to eat my brain tonight

ecc2fb5

thebracket added a commit that referenced this issue Feb 1, 2023

ISSUE #229 - Remove github provided cargo audit, it doesn't support c…

d14c423

…ode in subdirectories!

thebracket added a commit that referenced this issue Feb 1, 2023

ISSUE #229 - try to manually run cargo audit

5a1a3db

thebracket added a commit that referenced this issue Feb 1, 2023

ISSUE #229 - Add division by zero guard to the busy_quantile function.

309b399

thebracket added a commit that referenced this issue Feb 1, 2023

ISSUE #229 - More saturating_sub instead of just subtracting to avoid…

e2508a6

… underflows.

thebracket added a commit that referenced this issue Feb 1, 2023

ISSUE #229 - Use checked add with 0 default when adding packet counters.

cb3fa88

dtaht changed the title ~~Obsolete packages pass~~ Alpha Release Checklist Feb 2, 2023

dtaht pinned this issue Feb 2, 2023

thebracket added a commit that referenced this issue Feb 14, 2023

ISSUE #229 - Change "reload required" from a lock with a bool inside

c096226

to a proper atomic bool, for a completely unnoticable performance improvement.

thebracket added a commit that referenced this issue Feb 14, 2023

#229 - Run cargo fmt to format everything.

088f608

dtaht changed the title ~~Alpha Release Checklist~~ Release Checklist Mar 19, 2023

thebracket mentioned this issue Mar 23, 2023

Rust licenses #292

Merged

dtaht modified the milestones: v1.4, v1.5 Beta Mar 14, 2024

rchac modified the milestones: v1.5 Beta, v1.6 May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release Checklist #229

Release Checklist #229

dtaht commented Jan 26, 2023 •

edited by rchac

Loading

thebracket commented Feb 1, 2023

interduo commented Feb 14, 2023

dtaht commented Feb 14, 2023 •

edited

Loading

thebracket commented Feb 14, 2023

thebracket commented Mar 23, 2023

thebracket commented Mar 23, 2023

dtaht commented Mar 25, 2023 •

edited

Loading

dtaht commented Mar 25, 2023 •

edited

Loading

thebracket commented Mar 25, 2023 via email

interduo commented Feb 6, 2024

thebracket commented Feb 6, 2024

interduo commented May 22, 2024

thebracket commented May 23, 2024

interduo commented May 30, 2024

Release Checklist #229

Release Checklist #229

Comments

dtaht commented Jan 26, 2023 • edited by rchac Loading

Checklist items on code passes

Testing

Writing

Outreach plan

thebracket commented Feb 1, 2023

interduo commented Feb 14, 2023

dtaht commented Feb 14, 2023 • edited Loading

thebracket commented Feb 14, 2023

thebracket commented Mar 23, 2023

thebracket commented Mar 23, 2023

dtaht commented Mar 25, 2023 • edited Loading

dtaht commented Mar 25, 2023 • edited Loading

thebracket commented Mar 25, 2023 via email

interduo commented Feb 6, 2024

thebracket commented Feb 6, 2024

interduo commented May 22, 2024

thebracket commented May 23, 2024

interduo commented May 30, 2024

dtaht commented Jan 26, 2023 •

edited by rchac

Loading

dtaht commented Feb 14, 2023 •

edited

Loading

dtaht commented Mar 25, 2023 •

edited

Loading

dtaht commented Mar 25, 2023 •

edited

Loading