Update status periodically in smart terminals #2516

digit-google · 2024-10-29T14:14:23Z

Ensure that time-related NINJA_STATUS formatters such as %w as properly updated when running in a smart terminal.

This fixes issue #2515 by ensuring that Ninja can refresh its status even when only long commands are running.

Modify Subprocess::DoWork() to accept a timeout, and return an enum describing the exit condition.
Add Status::Refresh() and StatusPrinter::Refresh() to update the status on smart terminals.
Modify RealCommandRunner to call DoWork() with a timeout, and call status->Refresh() on expiration to trigger the update.

The end result is that the status is properly updated in smart terminals when a single long command is invoked (see the example build plan in #2515).

NOTE: This adds a new regression test in misc/output_test.py, however, the test is flaky since it is time-dependent, i.e. it runs successfully on a lightly loaded laptop, but fails frequently on Github CI. Hence it is disabled by default but can be run manually during development.

jhasse · 2024-10-31T13:21:48Z

Respect for the low-level timer implementations!

I'm actually working on a complete rewrite of Ninja's frontend that also fixes this. It does change a lot other stuff too though.

In order to have this working without an environment variable and to fix a few issues with ninja's frontend that require breaking changes, my plan would be to do this in ninja 2.0. This new v2 would live alongside ninja v1 so that we can test stuff and experiment with ideas (you mentioned some in #2515 (comment)). The new v2 would be heavier and shipped as a second binary (see #1499). I will open a PR for that in the next few days.

digit-google · 2024-10-31T14:04:32Z

Interesting, would you care sharing your plans about this rewrite? This might have a very important impact on the Fuchsia fork, as we try very hard to stay close to upstream (i.e. I routinely rebase our stack of patchs on top of upstream Ninja to make them manageable without any merges).

Since we are sharing our intentions:

I didn't mention here is that for the Fuchsia fork, which also needs to implement a client/server mode, we refactored Ninja considerably to introduce an AsyncLoop abstraction which allows for waiting for different types of I/O events, and made SubprocessSet use it for both Posix and Win32. This also supports timers, which are used in our implementation of StatusTable, which is how we print running commands in smart terminals.

I didn't want to introduce such level of complexity here, and I don't think upstream needs the AsyncLoop interface at this point.

I have also another branch to implement StatusTable in upstream Ninja that uses a simpler version of the class (i.e. one which doesn't depend on AsyncLoop at all). It hasn't been uploaded as a PR yet, but works on top of this PR properly.

For the record, this has been used by my team for over a year, and has helped tremendously in identifying build commands that were bogging the build for no good reason, which led to several build performance improvements that would not have been possible with Ninja's current output. Which is why I think having such a feature in v1 would benefit

Finally, we are working to introduce in our own fork a new Status sub-class to generate a Build Event Protocol stream. The BEP originated in Bazel but has become a format supported by many other build / CI analytics tools, it is a binary protobuf with a well-known fixed schema that provides rich information about what's happening during the build, is easy to convert to other formats (in particular into perfetto traces), and doesn't have the problems of JSON (slow encoding + bad non-UTF8 support). We do not expect upstream to accept such a feature though, but any change to how the frontend works is important for us. Surprisingly it doesn't add much code to the source tree.

FWIW, the Android Ninja fork already has such a feature, except that they use their own custom protobuf schema, which includes far more metrics in the event stream, but we prefer to stick to BEP for compatibility. They have been using it for years and are very happy with it.

Finally, I am a little wary about delivery. A major rewrite like the one you describe would take a large amount of work, and I didn't have the feeling that you have a lot of time to work on this. How many months / years are you forecasting for such work? I am really curious to know.

jhasse · 2024-10-31T16:23:52Z

Interesting!

I think some kind of async event loop is a good idea. I've used Boost.Asio since I'm not good enough to implement that from scratch, but I guess Boost is banned at Google because NIH? I guess a hand-crafted solution is also much faster to compile, results in a smaller binary and can be tuned for performance.

Those extensions you have in your Fuchsia branch are very interesting! That's why I dig the idea of a second v2 binary / folder: I could basically merge PRs that only touch v2 right away without much considerations, because it would be marked as "beta" and I don't have to worry about breaking something for existing users.

These are the breaking changes I have in mind:

After all the nice frontend improvements were added, make them the default instead of just options.
Make description part of the status format string in order to allow to change its position, see Extend format status output - Allow to change also order of description in NINJA_STATUS #1045
Introduce a --status command line flag, see Ninja status format string from argv. #1180, and remove support for NINJA_STATUS env var.
Add support for NINJA_FLAGS request: new environment variable NINJA_OPTIONS (or NINJA_FLAGS) #797. That's the replacement for NINJA_STATUS then, too.
Always realpath everything in order to fix mixing depfile absolute paths with manifest relative paths breaks rebuilds #1251.
Only use hashes instead of timestamps, see Option to use file characteristics instead of timestamps #1459.
To mitigate the performance impact of the previous two points: Add client/server mode.
Look at a lot more places for build manifest, e.g. in build/ or parent directories, see option to search in parent directories for build.ninja #1328.
Automatically configure CMake, etc. projects. I dream of a README.md of a C++ project which just lists ninja as the build instructions :D
Replace getopt with a good C++ arg parsing library. getopt has caused problems in the past, the recent case being your --jobserver flag. When doing that, also improve the argument names / format / etc.
Completely drop support for building ninja on old systems. This holds us back and if someone wants to compile ninja on their toaster they should just use samurai instead (or v1).

** IMPORTANT NOTE BEFORE PEOPLE START COMPLAINING: ** These are only ideas!

How many months / years are you forecasting for such work? I am really curious to know.

As that would be in parallel I would guess years? Because it's a second binary and the source would be in a v2 directory, I could quickly merge PRs that only touch on that code. My hope is that this encourages people to help out.

For the past years there were many good PRs that added useful functionality but weren't merged because there's always the risk of breaking something. I want to change that while keeping the stable ninja v1.

btw: My changes don't touch the existing code that much, nearly nothing at all. So you shouldn't have any problem rebasing the Fuchsia fork on top of them :)
I really should find the time to open the PR and show you my prototype ...

Allow the DoWork() method to return after a given timeout has expired. The new method returns a WorkResult value which is an enum that can take three values to indicate process completion, user interruption or that the timeout has expired.

Add a method to the abstract Status interface to refresh the status after some time has passed. Implement it properly in StatusPrinter. + Fix a bug in FormatProgressMessage() which ignored the value pass as argument (time_millis), and was using a member value instead (time_millis_). In practice, they were the same, but this is no longer true after this change.

Use the new Subprocess::DoWork(int64_t) method to wait for at most one second before updating the status in the Ninja terminal. NOTE: A new output_test.py is added to check this feature, but since it is time-dependent, it tends to fail on Github CI so is disabled by default. It is possible to run it manually though on a lightly-loaded machine.

digit-google added 2 commits November 11, 2024 13:35

digit-google force-pushed the periodic-status-updates branch from bd66585 to a8d5a9f Compare November 11, 2024 12:46

digit-google force-pushed the periodic-status-updates branch from a8d5a9f to c36bdc2 Compare November 11, 2024 13:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update status periodically in smart terminals #2516

Update status periodically in smart terminals #2516

digit-google commented Oct 29, 2024

jhasse commented Oct 31, 2024 •

edited

Loading

digit-google commented Oct 31, 2024

jhasse commented Oct 31, 2024 •

edited

Loading

Update status periodically in smart terminals #2516

Are you sure you want to change the base?

Update status periodically in smart terminals #2516

Conversation

digit-google commented Oct 29, 2024

jhasse commented Oct 31, 2024 • edited Loading

digit-google commented Oct 31, 2024

jhasse commented Oct 31, 2024 • edited Loading

jhasse commented Oct 31, 2024 •

edited

Loading

jhasse commented Oct 31, 2024 •

edited

Loading