Add an option to write a SBOM per binary #563
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Mostly (?) fixes #557
This sticks to having the whole SBOM operate in terms of packages, so that the toplevel element is still a package potentially consisting of several binaries; but there is a SBOM named after every binary, making them much easier to locate.
Why do it this way
So you may be thinking "Eww, gross! Why don't you just make the binary your toplevel element? Why generate a SBOM for the whole package?"
There is a long list of reasons why it is done this way, most of which are weaksauce ("all components are packages with PURLs so it's more consistent", "the per-binary mode isn't special and isn't a source of lots of extra complexity and bugs", etc). The real reason we output a SBOM for the whole package is the SBOMs clobbering each other.
Imagine you have a target that's both a binary and a shared library. The compiled artifacts get different extensions (
.exe
vs.dll
on Windows, nothing vs.so
on Linux), etc. However, when you try to emit stuff like debug info for them, the names will clash! Both aremyproject.pdb
on Windows. If you run into this, Cargo will print a warning but not actually do anything to fix this.We have to deal with this problem somehow as well.
There are two sources of clobbering:
We easily sidestep (2) by writing the SBOM into each package's directory. Boom, stuff from different packages cannot clash.
We sidestep (1) by emitting the SBOM for the whole package. So the library and binary targets within the same package can clobber each other as much as they like - it's still going to be the exact same file! You don't have to invent a scheme to differentiate between an rlib, a cdylib and an executable that all got built from the same package - you just look up the name of the binary and off you go.
In case you're wondering why don't we just name each SBOM after the final name of the binary,
.exe
and all: the answer is that we don't know them.cargo metadata
does not provide this information. #532 might help with that. But even if that ever happens, that would still be incompatible with the--target=all
mode.You may notice that these things are now named after a variable prefix and no longer fit the standard naming convention. And that is absolutely correct! That's a bug that was present all along for
--output-pattern=package
and that we inherited in this PR too. I think there needs to be a.cdx
in there when it's notbom.{xml,json}
. That is a change in user-visible behavior so that'll require shipping a 0.5.0 release. I think this is a necessary fix, I'll make a PR for that once this is merged to avoid conflicts.