Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve incremental build: make ninja handle dynamic outputs (continuation) #2366

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

HampusAdolfsson
Copy link
Contributor

This is a continuation of #1953, since it's gone stale. I'll copy the original description for convenience:

This PR is about making ninja being aware of dynamic outputs, it is related to this discussion on google groups: https://groups.google.com/g/ninja-build/c/Upz3hvKV8ro.

The Problem

The general problem I am trying to solve is improving the correctness of the incremental build. I think we can say the incremental build is correct when the result of running ninja is the same as doing a full build (enforcing all rule command to be executed again). Today there are at least two scenarios where this is not the case:

If you modified an output (manually or by mistake), ninja will not rebuilt. (this problem is addressed in the PR #1951)
If you rule command create files that cannot be predicted, i.e dynamic outputs, ninja will not be aware about those file and won't re-run the rule command if those files are modified or deleted. (this is the problem addressed by this PR)

Example In Practice

In my project, we are currently having a C++ code generatorwhich generates several header files out of the same source of data (one header per class). If someone delete or modified the header files in local, ninja will not be aware of that and won't re-run the generator. This issue can break the build or worse result in unexpected behavior of the application. When people report issues which could be related to incremental build issues, the first step is often "have you rebuild from scratch?". I would like to avoid the situation where people have to rebuilt from scratch to be sure their build state is correct.

My Solution

One solution to make ninja aware about dynamic outputs, is to have a mechanism similar to depfile to inform ninja about dynamic outputs during the build. My implementation introduce the dynout attribute which indicate the path to a file generated by the rule command containing the list of the outputs. The current syntax is simply having file path per line.

Example:

rule cpp_gen: codegen.exe --depfile $out.d --dynout $out.dynout --stamp $out $in
    depfile = $out.d
    dynout = $oud.dynout

build mydata.stamp: cpp_gen mydata.json

The dynamic outputs are stored in the deps log. As a matter of implementation it is very straightforward, I think it is the way to go. But it regarding naming, "deps" would not fit the concept anymore, because deps log will now contain dynamic dependencies as well as dynamic outputs.

I have also added a tool to list all the output including dynamic outputs in order to diagnosis the build.

Besides small fixes and documentation changes (see my commits for details), I've made the following changes:

Handling multi-target edges

Regarding @mathstuf's comment that the depslog supports multi-target edges:

An edge now loads dynamic outputs from the depslog for all its targets/outputs. I'll note that this is not how depslog dependencies are handled for multi-target edges (see LoadDepsFromLog). Depslog dependencies are only loaded for the first output, which seems incorrect to me.

Maintaining deps tool semantics

Changed the deps tool to only output dependencies from the depslog, not dynamic outputs. This means it keeps the same behaviour/semantics as before. I've added a dynouts tool which behaves like deps but instead outputs dynamic outputs from the depslog.

There was also a comment about newlines in file names. The dynout parser does not have any escaping mechanism to allow newlines in file names. I can add this, but I'll leave it to reviewers to decide whether that's necessary (does ninja supports paths with newlines elsewhere?).


cc: @Dragnalith, @mathstuf

@digit-google
Copy link
Contributor

I think your use case is already supported by dyndeps (which can add dynamic outputs, unlike depfiles), so I am unsure there is a need for such a change now. Maybe @jhasse has a different opinion though?

@jhasse
Copy link
Collaborator

jhasse commented Dec 22, 2023

Hi :) Sorry, I don't have enough free time to take a look at this atm.

@HampusAdolfsson
Copy link
Contributor Author

@digit-google The difference in use case between dyndeps and this dynout concept is the same as between dyndeps and depfile. In theory, instead of using a depfile you could have one edge that asks the compiler to produce header dependencies as a dyndeps file, and one edge that performs the actual compilation and depends on the dyndeps file. In practice, this would force you to parse the source file twice, which is undesirable.

Similarly, this dynout concept is useful when calculating the dynamic output files is expensive, meaning you want to calculate the output files in the same edge as you build them.

In my case dyndeps might be enough, I'll experiment a bit and see what differences there are in build times between dyndeps and dynouts.

@digit-google
Copy link
Contributor

Thanks for the clarification, I agree that dyndeps are a pain to generate (and the Ninja-specific format doesn´t help).

I would favor being able to support specifying additional outputs in the depfiles directly, to keep everything simpler. I don't have the time to look at that until next year, but I remember that currently the depfile loader just verifies that all outputs listed in the file are also listed as edge outputs in the manifest, and will complain otherwise. I hope changing this will not break other use cases though, or will not require Ninja-specific changes to the depfile format. The depslog should be relatively easy to adapt though.

I probably won't have the time to look at your change until next year. I recommend you refactor / rebase your commits into a smaller stack, getting rid of the random fixes, to make it easier to review.

@mathstuf
Copy link
Contributor

While I see the value in having ninja know about dynamic outputs, how are they intended to be used during a build? If the use case is just to allow ninja -t clean to know about and clean up extra bits, that's fine. My concern is about how a build system (e.g., CMake or Meson) is expected to utilize this for something like "compile each dynamic output source file using rule X" or the like. I doubt it is here since that involves dynamic nodes which, AFAIK, is very not-supported in ninja's current design (cf. #760).

There are then followups for things like installing generated headers and such that have no concrete representation in ninja.

I think a section on limitations might be warranted?

Other things that come to mind that may warrant a test case (I only skimmed; maybe these are covered):

  • build with dynout file in state A with output set X; update file to state B with output set Y and clean
    • do files in X and not Y get cleaned?
  • build with dynout file in state A with output set X; update file to state B with output set Y, build and then clean
    • do files in X and not Y get cleaned?
  • dynout lists "special" files in the output; do they get cleaned correctly?
    • directory (that is empty)
    • directory with undeclared contents (possibly from other rules that don't declare dynout)
    • directory listed before declared dynout outputs (the contents need removed before the directory)
    • symlink
    • broken symlink
    • platform-specific special files (named pipes, sockets, FIFO, junctions)
    • documentation on what is supported (maybe with warnings when noticing during the build?)

@jlonnberg
Copy link

@mathstuf
To address some of the questions here. This feature is primarily for the clean command in combination with with avoiding unwanted rebuilds.
A typical use-case here may be a build statement that produces output A explicitly and B implicitly. When cleaning the build, we want both A and B removed, but to do so currently, we'll need to regenerate the build.ninja file and add B as an explicit output. This also has unwanted effects, as the same build statement will now be treated as dirty as ninja_deps has no record of B.

I'm not sure that systems such as CMake will utilize this atm, but for other users it may be a good feature to improve the clean command.

To try to answer some of your questions here:

  • A clean command should remove any files recorded as an explicit output and previously reported dynout files. So changing the explicit output from X to Y will remove Y and any recorded dynout.
  • The clean command for dynout should work as cleaning a node with explicit output.

Does this sound reasonable?

@mathstuf
Copy link
Contributor

Ok, I just wanted to be clear on the intended scope. It seems to be mostly about -t clean. If that's the case, this seems fine to me (as in CMake doesn't really need to do anything at the moment as there's no special abstraction for this right now, but a feature request would be fine). I'd still like to see the test cases I enumerated considered and either implemented or explicitly declared as "not relevant". Some of them are possible problems with static outputs as well FWIW.

src/dynout_parser.cc Outdated Show resolved Hide resolved
src/dynout_parser.cc Outdated Show resolved Hide resolved
src/dynout_parser.cc Outdated Show resolved Hide resolved
}

}
bool DynoutParser::Parse(State* state, DiskInterface* disk_interface,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Call this function "Load()" since it doesn't only parse, but also modifies the graph non-trivially.

src/dynout_parser.cc Outdated Show resolved Hide resolved
src/dynout_parser.cc Outdated Show resolved Hide resolved
src/dynout_parser.cc Outdated Show resolved Hide resolved
src/dynout_parser.h Outdated Show resolved Hide resolved
src/dynout_parser.h Outdated Show resolved Hide resolved
@@ -147,7 +147,7 @@ bool DependencyScan::RecomputeNodeDirty(Node* node, std::vector<Node*>* stack,
if (!edge->deps_loaded_) {
// This is our first encounter with this edge. Load discovered deps.
edge->deps_loaded_ = true;
if (!dep_loader_.LoadDeps(edge, err)) {
if (!dep_loader_.LoadDeps(edge, err) || !dep_loader_.LoadImplicitOutputs(edge, err)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are modifying the logic of graph.cc in non-trivial way, please provide unit-tests that verify the new behavior in the same commit to make it easier to verify that everything works correctly.

Consider splitting this commit into two: one that introduces the Dynout parser/loader + its own unit-tests (without modifying the graph), then another one that uses the new information to modify the graph + appropriate unit-tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added tests in graph_test.cc and a few more build and clean tests to verify the new behaviour. Also separated the dynout parser into its own commit as you suggested.

@@ -35,7 +35,7 @@ using namespace std;
// The version is stored as 4 bytes after the signature and also serves as a
// byte order mark. Signature and version combined are 16 bytes long.
const char kFileSignature[] = "# ninjadeps\n";
const int kCurrentVersion = 4;
const int kCurrentVersion = 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest moving the code that modifies the deps log format to its own commit for clarity. Adjust tests appropriately in the same one if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@HampusAdolfsson
Copy link
Contributor Author

@digit-google I think I've addressed all your comments, please let me know if there's anything else that needs work.

The one major change I've made is that the build graph is no longer modified when parsing dynout files during the build. Dynamic outputs are now added to the build graph only at the start of the build. During the build, only the deps log is modified. This is in line with how dynamic dependencies work.

As @johanneslerch pointed out, updating the graph during the build can produce a temporarily invalid graph. Dynamic outputs (like dynamic dependencies) are not intended to dynamically change the build plan, so we need only require that they form a valid graph at the start (and end) of a build.

@mathstuf To address your comments:

  • build with dynout file in state A with output set X; update file to state B with output set Y and clean
    • do files in X and not Y get cleaned?

Correct, the output set is only updated when the edge is built, meaning that only the files in X are cleaned. I've added a test case for this.

  • build with dynout file in state A with output set X; update file to state B with output set Y, build and then clean
    • do files in X and not Y get cleaned?

The build updates the output set, so the clean will remove the files in Y. This is covered by testing that:

  • Dynamic outputs discovered during a build are saved to the deps log
  • A clean removes dynamic outputs saved in the deps log
  • dynout lists "special" files in the output; do they get cleaned correctly?

The cleaner makes no difference between dynamic and explicit outputs, so all these special files work exactly the same as when declared as explicit outputs in a build statement. I don't think there is a need for tests or documentation of such special behaviour in relation to this feature.

  • directory listed before declared dynout outputs (the contents need removed before the directory)

The deps log preserves the order of dynamic outputs, so they are cleaned in the order they are declared. It is up to the command generating the dynout file to ensure that files are listed before any parent directories. This is consistent with how build statements work, where outputs are cleaned from first to last.

@HampusAdolfsson HampusAdolfsson force-pushed the dynamic_outputs branch 2 times, most recently from e391fe7 to f6ff662 Compare May 20, 2024 14:57
Dragnalith and others added 2 commits June 12, 2024 10:51
This is to support dynamically computed outputs for the 'dynout'
attribute
Co-authored-by: Hampus Adolfsson <[email protected]>
@HampusAdolfsson
Copy link
Contributor Author

My change of no longer modifying the graph during the build had the side effect of dynamic outputs no longer being stored in the build log, and causing some edges to be considered dirty on the second build. I've changed the build log so that it takes dynamic outputs as an extra parameter when recording a command, and updated the tests to cover this case.

@ValeanuO
Copy link

ValeanuO commented Aug 5, 2024

Hi. Regarding the change to add the dynamic outputs to the build graph only at the start of the build:

What would be the behavior when the dynamic outputs are read from dynout file in case the file is missing? For example, the rule command didn't generate the list of outputs so the dynout file doesn't exist at that time.

@johanneslerch
Copy link

Please see #1953 (comment)
We did some testing of this PR and believe to observe some problems.

@HampusAdolfsson
Copy link
Contributor Author

Hi. Regarding the change to add the dynamic outputs to the build graph only at the start of the build:

What would be the behavior when the dynamic outputs are read from dynout file in case the file is missing? For example, the rule command didn't generate the list of outputs so the dynout file doesn't exist at that time.

If an edge has no entry in the deps log and has a missing dynout file, it is marked as dirty.

Please see #1953 (comment) We did some testing of this PR and believe to observe some problems.

Your examples exposed a bug in the error handling when loading dynouts, hence the rebuilds when using the file containing the conflicting outputs. I've fixed this, and it now properly aborts the build for that file.

As you've noticed, dynout files are not well suited to use cases where an output is sometimes dynamically detected and sometimes declared in the ninja file. If an output is sometimes dynamically detected, I'd suggest sticking to using dynout files even when the output is known beforehand. If that doesn't work, dyndep files may support your use case better, since they can modify the build graph during a build.

Dragnalith and others added 2 commits August 22, 2024 13:40
'outputs' lists all the outputs generated by the graph, including dynamic
outputs.

'dynouts' lists all dynamic outputs stored in the deps log.

Co-authored-by: Hampus Adolfsson <[email protected]>
@johanneslerch
Copy link

johanneslerch commented Aug 22, 2024

Thanks for the update. We will have a closer look at it.
We're still struggling to understand exactly the difference between dynout and dyndep. We've also experimented a little by changing dyndep to make it work for a single build statement (without the need to have 2 build statements). You can find our changes here: https://github.com/johanneslerch/ninja/commits/dyndep-single-statement/
In our tests that worked, though, we focused on dynamic outputs, only.
We don't want to divert the discussion in your PR too much, so we will proceed with creating a separate PR with these changes, after we rebased everything to the master (right now it's based on v1.11.1).

//edit: PR is here #2481

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants