Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release Checklist #229

Open
11 of 33 tasks
dtaht opened this issue Jan 26, 2023 · 14 comments
Open
11 of 33 tasks

Release Checklist #229

dtaht opened this issue Jan 26, 2023 · 14 comments
Milestone

Comments

@dtaht
Copy link
Collaborator

dtaht commented Jan 26, 2023

Checklist items on code passes

  • update obsolete python packages
  • update obsolete rust packages, check for security problems
  • Check for licensing issues - GPLv3 is a no-no at this time for final objects
  • Format the code consistently (does this actually recurse presently?)
  • System Error checking (all ? squashed in rust, all returns from c functions errnos checked), retrying with random exponential backoff where desirable on EINTR, EBUSY, ENOACCESS, ENOBUFS
  • syscall pass use strace -p pid -c to summarize times, see man page for many other options. Also tracking what syscalls are used feeds back into checking for errors, and reducing the number of syscalls reduces the probability of errors. Getting some data from production is also very useful here.
  • log_once or warn_once where appropriate rather than spamming the log
  • Math pass - check for possible div/0, nan, infinities and overflows, and algerbraic vs integer mistakes like summing x/100 + x1/100 + x2/100 rather than (x1 + x2 + x3)/100
  • Does it make sense to use floats or doubles anywhere?
  • SIMD pass - check for possible simd opportunities without necessarily screwing with the data flow overmuch
  • structure size pass (at least in debug)
  • heap table sizes and allocations/reallocations
  • tracking locks
  • eliminate unnecessary string parsing and flinging around
  • Eliminating O(n) behaviors. O(log(n)) (at least) should always be a goal. Test 10x the actual size of the typical dataset today (e.g. 100k customers)

Testing

  • make a deb
  • Get tyr up on it's own vlans
  • Install deb from scratch on tyr
  • pound it flat with the test suite
  • install deb on virtual machines

Writing

  • Release Notes
  • Release Announcement
  • Blog and cool examples from the field
  • Mission Statement
  • Competitive Analysis
  • Code of Conduct
  • v1.5 roadmap
  • Workflow process

Outreach plan

  • Podcast appearance (rule11? brothers wisp?)
  • Conference talks
  • Press of any sort
  • Independent reviews
  • Twitter, etc
@dtaht dtaht added this to the v1.4 milestone Jan 26, 2023
thebracket added a commit that referenced this issue Feb 1, 2023
of cargo audit and cargo outdated.

Part of ISSUE #229
thebracket added a commit that referenced this issue Feb 1, 2023
thebracket added a commit that referenced this issue Feb 1, 2023
@thebracket
Copy link
Collaborator

Rust dependencies all updated to latest, tweaks made to code where necessary. The GitHub CI now checks for CVEs and obsolete packages as part of the continuous integration run.

@dtaht dtaht changed the title Obsolete packages pass Alpha Release Checklist Feb 2, 2023
@dtaht dtaht pinned this issue Feb 2, 2023
@interduo
Copy link
Collaborator

In my opinion alpha version is ready to release (as we use current master branch state in production and it really do well work now). This issue should be renamed to "Beta release checklist" after alpha comes out.

It's better to release alpha early and often next versions than doing really big milestones. What do You think?

@dtaht
Copy link
Collaborator Author

dtaht commented Feb 14, 2023

Mentally I have targetted an alpha release for the end of the month. I would like the vast majority of this checklist to have gone through by then. Also I have an increasing desire to move stuff out of Ispconfig.py and into the toml, where it would share the configuration for the bridge with the rust, and the setup simplified more for new users. Lastly, I would like to find and on-board at least two new users to find the things that those with experience with the product aren't finding before declaring that state. In the latter two cases, it is not the existing users' I care so much about, but the costs of supporting and on-boarding the next 100, or 1000, and everything we can do to improve usability before the alpha or final release, with the resources available will pay off down the line.

I am glad that the code is considered stable enough to be in production.

I have mostly been focusing on the math (which has some problems), a decent sim of real RTTs, and the netlink/sampling problems, none of which are barriers to the alpha. Some of the items on the checklist, look easy, like coping with licensing issues and verifying the python is up to date: @rchac ?

As always I seek consensus on all we do or plan, and we have a meeting this thursday 1PM PST to discuss the remaining 31 open issues here: https://github.com/LibreQoE/LibreQoS/issues?q=is%3Aopen+is%3Aissue+milestone%3Av1.4 which we can punt, modify, or fix.

thebracket added a commit that referenced this issue Feb 14, 2023
to a proper atomic bool, for a completely unnoticable performance
improvement.
@thebracket
Copy link
Collaborator

I did a global cargo fmt run, so the Rust side is formatted consistently (yes, it recurses).

I've moved a couple of trivial locks to atomics, for a not-really-measurable performance change (an uncontested mutex lock is approximately 13 nanoseconds in userspace on an 8 core AMD Fx at 3.6 ghz; that's VERY hard to beat).

I've abandoned the effort to use lock-free structures because they are consistently outperformed by locks in the benchmarking I performed. An RwLock wrapped update of the TC queue statistics structure was consistently faster than a similar update in a lock-free structure (tested with DashMap and Crossbeam's SkipMap - the latter has horrible usage semantics). I saw a similar lack of improvement for unlocking per-host throughput data.

A few minutes ago an advisory hit about Rocket; I'll update when the fix exists. It's pretty trivial and doesn't seem to make us vulnerable to anything. The audit system alerted me to it.

@dtaht dtaht changed the title Alpha Release Checklist Release Checklist Mar 19, 2023
@thebracket
Copy link
Collaborator

I've put up a PR that checks for GPL3 in the Rust side of things. There isn't any. I haven't looked at the Python side. (PR #292 )

@thebracket
Copy link
Collaborator

For the other items:

  • "Format the code consistently" - cargo fmt will do that for you at any time. Yes its recursive.
  • "System error checking"
    • "All ? squashed in Rust" - error messages are in pretty good shape. The ? operator has its place and is used decently.
    • Checking for EINTR, EBUSY, ENOACCESS, ENOBUFS, E_WHY_DOESNT_C_HAVE_GOOD_ERROR_HANDLING and so on? Nope. Not even going there. You get an error message out of Rust functions that failed, indicating an error code. Sitting around adding manual checking for every possible error code is not a good use of anyone's time when there aren't issue reports relating to something failing that needs an error path.
  • Running strace . Well, I guess if it makes you happy have fun with that. Be sure to Google normal behavior. EAGAIN, for example, is part of normal futex operation...
  • log_once - not on my radar. I hate that macro. You introduce state into a simple operation, and now you're clogging up RAM storing previous log entries.
  • Math pass. We've found some NaNs and similar in regular debug cycles and fixed them.
  • "Does it make sense to use f32 or f64 anywhere?" What a strange question. There's at least 28 places where we use f32 in the Rust code base. What specifically isn't working?
  • "SIMD pass". There's SIMD code generated by the compiler all over the place (in 64-bit mode, anyway). Are there any specific hotspots you feel need optimized? Those would be good topics for bug reports.
  • "Structure size pass" - I've no idea what you're talking about. The biggest wasted space are the Cake stats we don't read.
  • "Heap table sizes and allocations/reallocations". Again, give me a specific pain point, including how it is negatively affecting performance, and I'll look at it.
  • "tracking locks" - the vast majority of frequently updated tables are lock free, and everything else uses RAII to ensure acquire/release are guaranteed.
  • "Eliminate unnecessary string parsing and flinging around". The bulk of the string parsing is in the queue stats. Not parsing fields we don't care about might help while queues are being watched. Again, though - to dedicate time to this you're going to have to find a case in which performance is adversely affected.
  • "Eliminating O(n) behaviors. O(log(n)) (at least) should always be a goal. Test 10x the actual size of the typical dataset today (e.g. 100k customers)". Duh. You're not the only one who has studied algorithm theory, including the part you didn't mention that even O(n^2) is fine if you don't do it often and there's a low n. Again, present cases of actual performance problems and we can look at the algorithms that are in use. (And bear in mind that the eBPF code is the "hot path" - runs per packet)

So I've done the parts of the "Checklist items on code passes" that make sense. The rest read more like an abstract guide on code creation, minus the parts about not optimizing things that don't matter.

@dtaht
Copy link
Collaborator Author

dtaht commented Mar 25, 2023

I appreciate all your pithy comments. Try to remember that we will one day on board devs far less experienced than us, and rather than hanging over every line of commit, I like to have deeply embedded the basic checklist items such as these. Someday, perhaps, there will be more.

Pieces of feedback on your feedback:

  1. Asking that the checklist be checked off, is something that has to happen on every major release. It is an assurance, to those not deep in the code, that the dev has and is sure those things have been dealt with.

  2. "Does it make sense to use double or floats", is related to the loss of precision that might happen if those are used, so I was asking essentially, did anything need to be a double, based on the data we were aggregating? We presently get data that has a dynamic range of billions (nsec to sec), which is outside the precision of normal floats.

  3. strace on the whole application(s), lets us see what system calls are used and make doubly, extra, super sure, that all possible error returns are successfully coped with.

2.1) EAGAIN, EBUSY, etc can and will bite you on high speed interactions with the kernel. I am merely going to wait until it bites you as hard as it has bitten me, before stressing on this point anymore.

  1. We have not actually tested a real workload any greater than what is deployed in the field. This is hung up on me constructing a decent enough sim.

  2. In C, dealing with many heap allocs and frees, is always begging for trouble. It is always faster to parse a string down to a real value.

I agree, that presently, we are collecting and keeping around too much useless data. Also, ideally we move away from parsing system tool output into more directly programming the kernel.

  1. My principal use for log_once is inside of large loops that might throw a ton of errors, which will permute the concurrency of other operations. More than once in this process, spamming the log has caused other problems.

  2. Simd is merely a look-at thing, in terms of structuring data and code so that it could be parallized if necessary.

Aside from that, we can work on coding guidelines and other items to try and

Very happy to see you take the time to review this checklist, and express your feelings about it!

@dtaht
Copy link
Collaborator Author

dtaht commented Mar 25, 2023

Moving to two items I did not check off:

It is nice to have a grip on structure sizes. Traditionally I tried to construct something that took every structure we had and showed the size of it. Even more so, it is nice to have a grip on possible memory leaks and the why of their growth patterns, and from what they may be coming from.

As for tracking heap allocations, well, you just ran into that problem in the chrome bug you have been coping with with. When you have a program that needs to run without leaks, for months at a time, even the smallest leak, will bite you.

Both of these are just nice to haves at this point.

@thebracket
Copy link
Collaborator

thebracket commented Mar 25, 2023 via email

@interduo
Copy link
Collaborator

interduo commented Feb 6, 2024

What is release checklist for v1.5 and what is "the date"?:)

@thebracket
Copy link
Collaborator

When it's done! We don't have a formal list at this time. I hope to include (not complete):

  • Unified configuration (no separate /etc/lqos.conf and ispConfig.py in separate places). There's a PR for that, so its almost done.
  • Per-flow accounting for RTT, the ability to ignore hosts, ignore flows below a certain traffic level, ASN-level analysis.
  • Some improvements under the hood for how the eBPF runs/communicates with userspace.
  • (Hopefully) the "virtual node" thing, where you can have a hierarchy in network.json but mark nodes "virtual" - so they don't build a hierarchical shaper node - but ARE counted for stats. So you could mirror your actual topology and see where the traffic and delays are without having to cram things together.
  • Hopefully some improvements to binpacking.

@dtaht dtaht modified the milestones: v1.4, v1.5 Beta Mar 14, 2024
@interduo
Copy link
Collaborator

What is the blocker for releasing v1.5rc1?

@thebracket
Copy link
Collaborator

Several merges, some testing and a handful of minor bugs in the configuration system. That, and we have day jobs!

@interduo
Copy link
Collaborator

Several merges, some testing and a handful of minor bugs in the configuration system. That, and we have day jobs!

Deb package build for develop branch are available (daily/ondemand/commit)?

@rchac rchac modified the milestones: v1.5 Beta, v1.6 May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants