Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

☭ Liberate workers! ☭ #7570

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open

Conversation

s0me0ne-unkn0wn
Copy link
Contributor

@s0me0ne-unkn0wn s0me0ne-unkn0wn commented Aug 1, 2023

This is a proposed follow-up of #7337.

Comrade @mrcnski did a great job splitting out the worker binaries, and it's time to finish what we've started.

Building worker binaries as parts of the root polkadot package binds all three binaries with the same set of features, dependencies, global allocator, and other things that are better to have separated. So, in the name of revolution, we have to liberate workers completely.

The proposal is to create separate binary packages node-core-pvf-{prepare|execute}-worker-cli that accept independent sets of features as needed. Being members of the same workspace, they will share the same target directory with the main binary.

A more straightforward way would be to transform node-core-pvf-{prepare|execute}-worker into binary packages, but that doesn't work as their exported entrypoints are used in other places (like puppet-worker).

TODO:

  • Build workers automagically when malus is built, if possible
  • Test thoroughly that this change doesn't break worker versioning during development builds again
  • CI/CD fixes, if needed

Copy link
Contributor

@mrcnski mrcnski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks comrade. I tried something like this in #7337 as I was iterating over solutions but I didn't know about default-members. Looks like it is the secret to this working.

I would just restructure it so that the crates are close to the root, maybe in a bin folder or something. I mean they are direct dependencies of polkadot, so shouldn't tuck them away in node/core/pvf/blah/cli/.

node/core/pvf/prepare-worker/build.rs Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
@@ -79,7 +79,7 @@ build-malus:
- .compiler-info
- .collect-artifacts
script:
- time cargo build --locked --profile testnet --verbose -p polkadot-test-malus
- time cargo build --locked --profile testnet --verbose -p polkadot-test-malus -p polkadot-execute-worker -p polkadot-prepare-worker
Copy link
Contributor Author

@s0me0ne-unkn0wn s0me0ne-unkn0wn Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried different approaches and ended up explicitly building workers for malus.

This one even works but has some flaws and looks too hacky to end up in the production code.

Any better ideas are appreciated.

Copy link
Contributor

@mrcnski mrcnski Aug 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have a separate workspace for malus, then build it the exact same way we do polkadot? i.e. if we list the workers as default-members, we don't have to specify them with -p for each build command.

I think this would work if the polkadot workspace didn't list malus, and the malus Cargo.toml depended on polkadot and listed the workers as workspace members and default-members. Worth a try at least.

This one even works but has some flaws and looks too hacky to end up in the production code.

Yeah that looks too hacky to me, but -p .. is fine if nothing else works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have a separate workspace for malus

Unfortunately, no, nested workspaces are proposed but not supported yet. I went through a lot of interesting proposals while I was researching possible solutions, like intersecting workspaces and binary dependencies, but neither of them is implemented now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through a lot of interesting proposals while I was researching possible solutions ... but neither of them is implemented now.

Yeah I know the feeling. I think -p .. is fine, maybe just update the readme with build instructions like we did for node metrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've found out that cargo test --workspace --profile testnet triggers building some auxiliary binaries like adder-collator and undying, which are required for tests residing in tests/. But I still need to understand what mechanism triggers those builds. Probably it can be used for malus as well.

@mrcnski
Copy link
Contributor

mrcnski commented Aug 4, 2023 via email

@s0me0ne-unkn0wn
Copy link
Contributor Author

I got seriously stuck with building the workers for integration tests, but I hope I've sorted it out finally, and I'm testing an approach that seems to work now.

One thing I realized during the research is that we don't really need puppet workers currently used in integration tests. Real workers just duplicate their functionality now and may be used instead of puppets. The only thing puppets have and real workers don't is some test-only commands like exit and sleep. They may be easily integrated into the real workers and gated through #[cfg(test)].

That's only a thought that crossed my mind and I put it here not to lose it.

@mrcnski
Copy link
Contributor

mrcnski commented Aug 6, 2023

@s0me0ne-unkn0wn Good idea, just keep in mind that #[cfg(test)] does not work across crates. (Maybe can instead use the new test-utils feature.) But if that works, it would be a nice simplification to remove decl_puppet_worker_main as a follow-up. Instead we would directly use decl_worker_main for puppet workers. That would have nice advantages: we test always directly the real worker code, and we don't have to keep the code of puppet/real workers in sync anymore.

@s0me0ne-unkn0wn
Copy link
Contributor Author

Okay, so I've added two dummy integration tests solely for the purpose of building binaries when integration tests are run. I'm not a fan of this approach, but it works. Probably they could have been changed into some useful tests executing some parachain blocks, but we already have those tests in adder-collator and undying-collator packages, and that doesn't make much sense to duplicate them here.

@mrcnski
Copy link
Contributor

mrcnski commented Aug 8, 2023

@s0me0ne-unkn0wn Hmrph, not a fan of assert!(true); 🙃 Maybe we can try to test something useful to justify these dummy tests, e.g. make sure the files exist and are executable.

I'm glad you got it working (cargo is sadly limited), but what is the need for the new lib.rs files? Is it possible to simplify? Btw one of the new files has a typo (intergation.rs).

@s0me0ne-unkn0wn
Copy link
Contributor Author

Maybe we can try to test something useful to justify these dummy tests, e.g. make sure the files exist and are executable.

A useful test would be great here, but making sure that files exist for a worker binary sounds more like a unit test than an integration test, so the paradigm is broken anyway. But yes, I'm still hunting for ideas of useful integration tests for workers not duplicating the functionality of other tests we already have.

but what is the need for the new lib.rs files?

I first added them during one of my experiments trying to get it to work, but later decided to keep them to export binary names to external crates. If somebody decides to change binary names, he'll need to do it in both crate manifest and lib.rs, otherwise the unit test will fail. That should prevent broken builds where binaries are renamed and cannot run. Well, probably a somewhat synthetic justification for their existence, but I still believe they are useful (a bit).

Is it possible to simplify?

Absolutely! But I hope we'll be able to get rid of all that stuff altogether in paritytech/polkadot-sdk#583.

Btw one of the new files has a typo (intergation.rs).

Nice catch! I make a typo in this word like 80% times when I type it. Will fix.

Comment on lines +17 to +18
/// Prepare worker binary name
pub const BINARY_NAME: &str = "polkadot-prepare-worker";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the same as PREPARE_BINARY_NAME in pvf-host. Can we move it to the pvf/prepare-worker crate, and use that here, for less duplication?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm there is an issue with the PVF host depending on the worker binaries in this way. The worker binaries will be recompiled after every commit which will be annoying when hacking on pvf crate. I think we can move the binary names into pvf-common.

build.rs Outdated

// assert!(false);

// TODO: opt-level, debug, features, etc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this build.rs script to fix the issue where cargo run does not build the worker binaries. This is still WIP but it needs to get out soon, ideally before the next release. I didn't open a separate PR but tagged this onto yours because it depends on the workers being their own crates.

But how do we forward everything from the cargo command that was called. For features we will have to iterate over all env vars to get them and pass them on. For things like opt-level, lto etc. (there are more) I don't see how to pass them on, but I think the child cargo process inherits it from env vars, but am not sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remembered that we eventually want to build these workers with the musl target, so we'll need this script anyways.

Also, I realized we don't need support for features, the workers don't use any.

Copy link
Contributor

@mrcnski mrcnski Aug 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into whether profile settings are passed along, and they are! Env vars like CARGO_CFG_OVERFLOW_CHECKS (see https://doc.rust-lang.org/cargo/reference/environment-variables.html#configuration-environment-variables) are passed along, confirmed by logging all env vars from execute-worker's build script. So this
script is basically good-to-go, modulo a couple of small TODOs.

Actually they are not. I ran --profile=testnet, and CARGO_CFG_OVERFLOW_CHECKS and CARGO_CFG_PANIC were set correctly (I guess they are inherited from release), but DEBUG was not set.

We should be able to forward them with rustc -C flags. See here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should undo the change removing default-members. I thought it wouldn't be necessary anymore with the build script, but I realized that cargo install will not work for the worker binaries without it.

Copy link
Contributor

@mrcnski mrcnski Aug 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can experiment with moving the build script to cli crate or something, that way it will also work for malus which depends on that crate. Might make things cleaner. But since we should get this PR in ASAP (hard deadline by Aug 18 before the switch to monorepo), it can be a followup.

Edit: everything in above comments has been done except this one!

mrcnski and others added 4 commits August 9, 2023 19:17
I looked into whether profile settings are passed along, and they are! So this
script is basically good-to-go, modulo a couple of small TODOs.
build.rs Outdated Show resolved Hide resolved
- [ ] Fix profile
- [ ] Use eprintln!
- [ ] Add test steps
- [ ] Remove some TODOs
- [ ] Add some comments
@mrcnski mrcnski added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Aug 11, 2023
@mrcnski mrcnski added the D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit. label Aug 11, 2023
@mrcnski mrcnski marked this pull request as ready for review August 11, 2023 14:03
@mrcnski mrcnski requested review from chevdor and a team as code owners August 11, 2023 14:03
@mrcnski mrcnski requested review from eskimor and sandreim August 11, 2023 14:03
@paritytech-ci paritytech-ci requested a review from a team August 11, 2023 14:03
@mrcnski
Copy link
Contributor

mrcnski commented Aug 11, 2023

Let's aim to get this merged by Aug 18:

  1. This fixes cargo run which would be really nice to get out before the binary workers split ends up in the next release.
  2. We can no longer merge PRs in this repo on (after?) Aug 18 when the move to monorepo begins.

@paritytech-cicd-pr
Copy link

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: test-linux-stable
Logs: https://gitlab.parity.io/parity/mirrors/polkadot/-/jobs/3375743

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants